f

fiction livebench for long context, deep comprehension

Tool / ProductMentioned in 1 video

A benchmark for evaluating AI models on their ability to process and understand long contexts, used to compare LLaMA 4's performance unfavorably.