Mixture-of-Experts (MoE) is an artificial neural network architecture that uses separate subsets of the neural network for different tasks. It can be seen as a new type of model that incorporates an agent (router), many expert models and functions (tools) with an enormous context window.

This architecture, in theory, makes the models easier to train and cheaper to run as not every prediction has to traverse the entire model memory.

High-profile models that use this architecture include Gemini 1.5, Mistral 8x7B, and DeepSeek.