DCLM
Software / App
Mentioned as a model using the Z-loss trick for stabilizing output softmaxes.
Mentioned in 3 videos
Save the 3 videos on DCLM to your own pod.
Sign up free to keep building your knowledge base on DCLM as more episodes are added.
Videos Mentioning DCLM

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 3: Architectures
Stanford Online
Mentioned as a model using the Z-loss trick for stabilizing output softmaxes.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 14: Data
Stanford Online
A model mentioned in an experiment showing that high-quality data yields better results initially, but over very long training periods, lower-quality data can also be effective.

Inference, Diffusion, World Models, and More | YC Paper Club
Y Combinator
A dataset used for experiments, simulating a data-constrained world.