Ruler Suite

Software / AppMentioned in 1 video

A more comprehensive set of benchmarks for evaluating long context models, including multi-needle retrieval, variable tracking, and summary statistics.