Terminal Bench
Study / Research
A benchmark used to measure progress in raw intelligence for coding tasks, assessing capabilities relevant to software engineers.
Mentioned in 2 videos
Save the 2 videos on Terminal Bench to your own pod.
Sign up free to keep building your knowledge base on Terminal Bench as more episodes are added.
Videos Mentioning Terminal Bench

AI Dev 26 x SF | A Fireside Chat with OpenAI's Marc Manara
DeepLearningAI
A benchmark used to measure progress in raw intelligence for coding tasks, assessing capabilities relevant to software engineers.

AI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway
DeepLearningAI
A benchmark developed by Stanford that focuses on real-world software engineering tasks, including database issues, race conditions, and front-end bugs.