Sigmoid Attention

Study / Research

A recent paper that studies the distribution of logits and attention weights, relevant to improving long context handling in language models.

Mentioned in 1 video