Mixture-of-Experts
An architectural trend where models consist of multiple 'expert' sub-networks, used in models like DeepSeek V3, offering potential benefits in efficiency and performance, with ongoing optimization efforts.
Videos Mentioning Mixture-of-Experts

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Latent Space
An architectural approach allowing models to selectively use different 'experts' for different tasks, contrasting with dense models like Llama.

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
An architectural approach that Yi Tay is bullish on, believing it to be a promising direction in terms of flop-to-parameter ratio, offering a way to 'cheat' traditional scaling laws by having more parameters at a low flop cost. He questions its tradeoffs in capabilities.

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
An architectural trend where models consist of multiple 'expert' sub-networks, used in models like DeepSeek V3, offering potential benefits in efficiency and performance, with ongoing optimization efforts.

A Comprehensive Overview of Large Language Models - Latent Space Paper Club
Latent Space
An architecture where multiple 'expert' networks specialize in different aspects of the input, chosen by a gating network.