Self-Attention

Concept

A variant of attention where a model attends to different positions of a single sequence to compute a representation of the same sequence, central to the Transformer architecture.

Mentioned in 2 videos