M

MMLU Benchmark

Tool / ProductMentioned in 1 video

A benchmark used to evaluate AI models, described by the speaker as flawed and more of a memorization challenge than a true reasoning test.