MLU

Software / App

A benchmark, considered flawed but fascinating, that covers aptitude and knowledge across various domains.

Mentioned in 2 videos

Save the 2 videos on MLU to your own pod.

Latent Space

A benchmark used to test the efficacy of role prompting, where Sander's experiments showed the 'idiot prompt' outperformed the 'genius prompt'.

AI Explained

A benchmark, considered flawed but fascinating, that covers aptitude and knowledge across various domains.