Sync-Attention

Study / Research

A paper that showed language models tend to give more weight to the first few tokens in attention mechanisms, suggesting a focus for future research in extending long context capabilities.

Mentioned in 1 video