Benchmark cited as a rigorous evaluation by OpenAI; tests multiple languages.
AI Explained
Latent Space