Chinchilla paper
Book
A paper on compute-optimal training, but it's noted that it specifically refers to pre-training compute optimal training, highlighting a shift towards inference compute optimality.
Mentioned in 2 videos
Videos Mentioning Chinchilla paper

State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
A research paper that introduced scaling laws for language models, referenced in the discussion of CARBS's ability to learn similar scaling laws for various hyperparameters.

2024 Year in Review: The Big Scaling Debate, the Four Wars of AI, Top Themes and the Rise of Agents
Latent Space
A paper on compute-optimal training, but it's noted that it specifically refers to pre-training compute optimal training, highlighting a shift towards inference compute optimality.