Benchmark that tests multiple languages and aims to resist contamination.
Mentioned in 1 video
AI Explained