humanity's last exam
Software / App
A benchmark designed to test AI models on a comprehensive set of intense tasks, with the implication that passing them signifies AGI.
Mentioned in 2 videos
Save the 2 videos on humanity's last exam to your own pod.
Sign up free to keep building your knowledge base on humanity's last exam as more episodes are added.
Videos Mentioning humanity's last exam

Claude Opus-4.7 Just Dropped, And...
Nick Saraev
A benchmark designed to test AI models on a comprehensive set of intense tasks, with the implication that passing them signifies AGI.

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies
AI Explained
A benchmark measuring arcane knowledge and advanced reasoning. GPT-5.5 is beaten by Opus 4.7, Mythos, and Gemini 3.1 Pro on this benchmark.