Needle in a Haystack

Concept

A long-context evaluation method for LLMs, criticized for not measuring holistic context use and for being relatively easy for models to 'trick' without true reasoning.

Mentioned in 2 videos