WildBench
Software / App
A benchmark that uses LLMs as judges with a checklist or rubric to evaluate prompts and responses, aiming for more defined evaluations.
Mentioned in 1 video
A benchmark that uses LLMs as judges with a checklist or rubric to evaluate prompts and responses, aiming for more defined evaluations.