GSM8K
Concept
A benchmark for mathematical reasoning that, along with others like HumanEval, is considered 'contaminated' or saturated, meaning high scores are no longer truly indicative of breakthrough performance.
Mentioned in 2 videos
Save the 2 videos on GSM8K to your own pod.
Sign up free to keep building your knowledge base on GSM8K as more episodes are added.
Videos Mentioning GSM8K

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
A benchmark for mathematical reasoning that, along with others like HumanEval, is considered 'contaminated' or saturated, meaning high scores are no longer truly indicative of breakthrough performance.

State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
A math reasoning benchmark, noted for having some 'weird' qualities that require careful interpretation.