fiction livebench for long context, deep comprehension
Software / App
A benchmark for evaluating AI models on their ability to process and understand long contexts, used to compare LLaMA 4's performance unfavorably.
Mentioned in 1 video
A benchmark for evaluating AI models on their ability to process and understand long contexts, used to compare LLaMA 4's performance unfavorably.