Sync-Attention
Study / ResearchMentioned in 1 video
A paper that showed language models tend to give more weight to the first few tokens in attention mechanisms, suggesting a focus for future research in extending long context capabilities.
A paper that showed language models tend to give more weight to the first few tokens in attention mechanisms, suggesting a focus for future research in extending long context capabilities.