Mixture-of-Experts (MoE) is an artificial neural network architecture that can use different subset(s) of the neural network for different tasks (tasks that require different skillsets) based on the input. It can be seen as a new type of model that incorporates an agent, expert models and functions (tools in LangChain) with an enormous context window.
In theory, this architecture makes the models easier to train and control. High-profile models that use this architecture are Gemini 1.5 and Mixtral 8x7B from Mistral.