DeepSeek
Book
The original DeepSeek paper is discussed for its serious scaling analysis and its approach to fitting scaling laws for optimal batch size and learning rate.
Mentioned in 1 video
The original DeepSeek paper is discussed for its serious scaling analysis and its approach to fitting scaling laws for optimal batch size and learning rate.