Ruler Suite
Software / AppMentioned in 1 video
A more comprehensive set of benchmarks for evaluating long context models, including multi-needle retrieval, variable tracking, and summary statistics.
A more comprehensive set of benchmarks for evaluating long context models, including multi-needle retrieval, variable tracking, and summary statistics.