AgentBench

Study / Research

A paper discussing agent evaluations, noted for examples in its appendix that were found to contain non-optimal solutions.

Mentioned in 1 video