WildBench

Software / App

A benchmark that uses LLMs as judges with a checklist or rubric to evaluate prompts and responses, aiming for more defined evaluations.

Mentioned in 1 video