Humanity's Last Exam
Study / Research
A benchmark designed to test obscure topics, where Claude Mythos, with tools, achieved nearly two-thirds of the questions right, surpassing other frontier models.
Mentioned in 1 video
A benchmark designed to test obscure topics, where Claude Mythos, with tools, achieved nearly two-thirds of the questions right, surpassing other frontier models.