Mixture-of-Experts (MoE) is an artificial neural network architecture that uses separate subset(s) of the neural network for different tasks based on the input, that is, tasks that require different skillsets. It can be seen as a new type of model that incorporates an agent, expert models and functions (tools) with an enormous context window.

In theory, this architecture makes the models easier to train and control. High-profile models that use this architecture include Gemini 1.5 and Mistral 8x7B.