Sync-Attention
Study / Research
A paper that showed language models tend to give more weight to the first few tokens in attention mechanisms, suggesting a focus for future research in extending long context capabilities.
Mentioned in 1 video
A paper that showed language models tend to give more weight to the first few tokens in attention mechanisms, suggesting a focus for future research in extending long context capabilities.