Sigmoid Attention
Study / ResearchMentioned in 1 video
A recent paper that studies the distribution of logits and attention weights, relevant to improving long context handling in language models.
A recent paper that studies the distribution of logits and attention weights, relevant to improving long context handling in language models.