Features
Discover
Use Cases
Pricing
Blog
Login
Get Started
Toggle theme
Discover
Topics
Machine Learning Evaluation
Machine Learning Evaluation
2 video summaries
Videos About Machine Learning Evaluation
The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals
Latent Space
Best of 2024 in Agents (from #1 on SWE-Bench Full, Prof. Graham Neubig of OpenHands/AllHands)
Latent Space