Mixture-of-Experts
An architectural trend where models consist of multiple 'expert' sub-networks, used in models like DeepSeek V3, offering potential benefits in efficiency and performance, with ongoing optimization efforts.
Common Themes
Videos Mentioning Mixture-of-Experts

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Latent Space
An architectural approach allowing models to selectively use different 'experts' for different tasks, contrasting with dense models like Llama.

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
An architectural approach that Yi Tay is bullish on, believing it to be a promising direction in terms of flop-to-parameter ratio, offering a way to 'cheat' traditional scaling laws by having more parameters at a low flop cost. He questions its tradeoffs in capabilities.

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
An architectural trend where models consist of multiple 'expert' sub-networks, used in models like DeepSeek V3, offering potential benefits in efficiency and performance, with ongoing optimization efforts.

A Comprehensive Overview of Large Language Models - Latent Space Paper Club
Latent Space
An architecture where multiple 'expert' networks specialize in different aspects of the input, chosen by a gating network.

Ep 18: Petaflops to the People — with George Hotz of tinycorp
Latent Space
A technique used in models like GPT-4 where multiple smaller models are combined, discussed as a way to scale beyond parameter limits.

Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494
Lex Fridman
An AI model architecture that uses multiple 'expert' networks; mentioned as an example of an AI innovation that requires anticipating hardware changes.