SWEBench Pro
Study / Research
A software engineering benchmark where Claude Mythos reportedly shows a 25% improvement over Opus 4.6.
Mentioned in 1 video
A software engineering benchmark where Claude Mythos reportedly shows a 25% improvement over Opus 4.6.