MLU
Software / AppMentioned in 2 videos
A benchmark, considered flawed but fascinating, that covers aptitude and knowledge across various domains.
Videos Mentioning MLU

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org
Latent Space
A benchmark used to test the efficacy of role prompting, where Sander's experiments showed the 'idiot prompt' outperformed the 'genius prompt'.

Gemini 2.5 Pro - It’s a Darn Smart Chatbot … (New Simple High Score)
AI Explained
A benchmark, considered flawed but fascinating, that covers aptitude and knowledge across various domains.