Hellaswag
Study / Research
A common NLP benchmark mentioned as one of the public evaluations that IMB has reviewed and cleaned for ambiguity and data contamination.
Mentioned in 2 videos
Videos Mentioning Hellaswag

State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
A common NLP benchmark mentioned as one of the public evaluations that IMB has reviewed and cleaned for ambiguity and data contamination.

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
Latent Space
An evaluation tool on the Hugging Face leaderboard.