Ruler Suite
Software / App
A more comprehensive set of benchmarks for evaluating long context models, including multi-needle retrieval, variable tracking, and summary statistics.
Mentioned in 1 video
A more comprehensive set of benchmarks for evaluating long context models, including multi-needle retrieval, variable tracking, and summary statistics.