Mixture-of-Experts

Concept

machine learning technique

Mentioned in 10 videos

Build a research pod on Mixture-of-Experts.

10 expert discussions. Save them all to your own pod, ask any question, get cited answers.

Common Themes

Technology & Innovation Business & Entrepreneurship Neuroscience & the Brain AI & Machine Learning Human Performance Science & Mathematics Career & Skills Ai-Ethics Programming & Software Career Development

Videos Mentioning Mixture-of-Experts

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Latent Space

An architectural approach allowing models to selectively use different 'experts' for different tasks, contrasting with dense models like Llama.

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Latent Space

An architectural approach that Yi Tay is bullish on, believing it to be a promising direction in terms of flop-to-parameter ratio, offering a way to 'cheat' traditional scaling laws by having more parameters at a low flop cost. He questions its tradeoffs in capabilities.

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)

Latent Space

An architectural trend where models consist of multiple 'expert' sub-networks, used in models like DeepSeek V3, offering potential benefits in efficiency and performance, with ongoing optimization efforts.

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Latent Space

An architecture where multiple 'expert' networks specialize in different aspects of the input, chosen by a gating network.

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Latent Space

A technique used in models like GPT-4 where multiple smaller models are combined, discussed as a way to scale beyond parameter limits.

Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494

Lex Fridman

An AI model architecture that uses multiple 'expert' networks; mentioned as an example of an AI innovation that requires anticipating hardware changes.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 1: Overview, Tokenization

Stanford Online

An architectural paradigm for building compute-efficient transformers, discussed as a topic for the current year's course.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 8: Parallelism

Stanford Online

A model architecture where communication is less clear as tokens route to different experts, well-suited for GPUs.

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford Online

A scaling technique discussed in relation to large models, with Nuaman Tazzy's work spanning initiatives in this area.

Stanford CS153 Frontier Systems | The Discipline of Delivering Value per Gigawatt

Stanford Online

An AI model architecture that uses sparse computation, leading to a need for higher memory bandwidth relative to computation ratios.