Terminal Bench

Software / App

A benchmark for evaluating AI agents' ability to perform tasks in a terminal environment.

Mentioned in 1 video