Muon
Software / App
An optimizer increasingly used in training the latest open models, such as the Kimmy K2 models.
Mentioned in 2 videos
Save the 2 videos on Muon to your own pod.
Sign up free to keep building your knowledge base on Muon as more episodes are added.
Videos Mentioning Muon

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 1: Overview, Tokenization
Stanford Online
An optimizer increasingly used in training the latest open models, such as the Kimmy K2 models.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 11: Scaling Laws
Stanford Online
An optimizer that showed significant gains over Adam in small-scale experiments, particularly for the nanoGPT speedrun benchmark. It treats matrix-valued parameters differently.