Needle in a Haystack
ConceptMentioned in 2 videos
A long-context evaluation method for LLMs, criticized for not measuring holistic context use and for being relatively easy for models to 'trick' without true reasoning.
Videos Mentioning Needle in a Haystack

State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
A long-context evaluation method for LLMs, criticized for not measuring holistic context use and for being relatively easy for models to 'trick' without true reasoning.

How to train a Million Context LLM — with Mark Huang of Gradient.ai
Latent Space
A standard benchmark for evaluating long context models, assessing their ability to retrieve specific information from large amounts of text.