DeepSeek

Book

The original DeepSeek paper is discussed for its serious scaling analysis and its approach to fitting scaling laws for optimal batch size and learning rate.

Mentioned in 1 video