Humanity's Last Exam
Concept
A benchmark composed of 2,500 hard questions across domains.
Mentioned in 4 videos
Videos Mentioning Humanity's Last Exam

The Powerful Alternative To Fine-Tuning
Y Combinator
A benchmark composed of 2,500 hard questions across domains.

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
AI Explained

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI
Latent Space
A benchmark that features difficult but easily gradable problems, which Noam Brown suggests limits the scope of AI evaluation to more common, measurable tasks rather than fuzzier, more complex ones.

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...
AI Explained