MLU

Software / App

A benchmark, considered flawed but fascinating, that covers aptitude and knowledge across various domains.

Mentioned in 2 videos

Videos Mentioning MLU

Latent Space

A benchmark used to test the efficacy of role prompting, where Sander's experiments showed the 'idiot prompt' outperformed the 'genius prompt'.

AI Explained

A benchmark, considered flawed but fascinating, that covers aptitude and knowledge across various domains.