MMLU Benchmark

Software / App

A benchmark used to evaluate AI models, described by the speaker as flawed and more of a memorization challenge than a true reasoning test.

Mentioned in 1 video