Humanity's Last Exam
Concept
A benchmark composed of 2,500 hard questions across domains.
Mentioned in 4 videos
Save the 4 videos on Humanity's Last Exam to your own pod.
Sign up free to keep building your knowledge base on Humanity's Last Exam as more episodes are added.
Videos Mentioning Humanity's Last Exam

The Powerful Alternative To Fine-Tuning
Y Combinator
A benchmark composed of 2,500 hard questions across domains.

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
AI Explained

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI
Latent Space
A benchmark that features difficult but easily gradable problems, which Noam Brown suggests limits the scope of AI evaluation to more common, measurable tasks rather than fuzzier, more complex ones.

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...
AI Explained