fiction livebench for long context, deep comprehension

Software / App

A benchmark for evaluating AI models on their ability to process and understand long contexts, used to compare LLaMA 4's performance unfavorably.

Mentioned in 1 video