Hellaswag

Study / Research

A common NLP benchmark mentioned as one of the public evaluations that IMB has reviewed and cleaned for ambiguity and data contamination.

Mentioned in 2 videos