Needle in a Haystack

ConceptMentioned in 2 videos

A long-context evaluation method for LLMs, criticized for not measuring holistic context use and for being relatively easy for models to 'trick' without true reasoning.