Chinchilla trap

Concept

The idea that prioritizing the Chinchilla scaling laws for maximum paper performance might not be optimal for models intended for widespread inference use, suggesting longer training is better.

Mentioned in 1 video