MMLU

Software / App

A benchmark mentioned as an example of 'PhD++ problems' in AI, which current models are surpassing, in contrast to the ARC benchmarks that normal people can solve.

Mentioned in 2 videos