HellaSwag

Software / App

A benchmark dataset designed to test common sense reasoning in LLMs by completing sentences adversarially generated.

Mentioned in 1 video