Humanity's Last Exam

Concept

benchmark for large language models

Mentioned in 5 videos

lastexam.ai

Save the 5 videos on Humanity's Last Exam to your own pod.

Get Started Free

Videos Mentioning Humanity's Last Exam

The Powerful Alternative To Fine-Tuning

Y Combinator

A benchmark composed of 2,500 hard questions across domains.

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI

AI Explained

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Latent Space

A benchmark that features difficult but easily gradable problems, which Noam Brown suggests limits the scope of AI evaluation to more common, measurable tasks rather than fuzzier, more complex ones.

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...

AI Explained

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 12: Evaluation

Stanford Online

A benchmark created to challenge models with multimodal, multi-subject questions, aiming to be extremely difficult and using a private held-out set to mitigate training contamination.