Benchmark cited as a rigorous evaluation by OpenAI; tests multiple languages.
Mentioned in 1 video
AI Explained