Sigmoid Attention

Study / ResearchMentioned in 1 video

A recent paper that studies the distribution of logits and attention weights, relevant to improving long context handling in language models.