Sync-Attention

Study / ResearchMentioned in 1 video

A paper that showed language models tend to give more weight to the first few tokens in attention mechanisms, suggesting a focus for future research in extending long context capabilities.