S

Self-Attention

ConceptMentioned in 1 video

A variant of attention where a model attends to different positions of a single sequence to compute a representation of the same sequence, central to the Transformer architecture.