S
Self-Attention
ConceptMentioned in 1 video
A variant of attention where a model attends to different positions of a single sequence to compute a representation of the same sequence, central to the Transformer architecture.
A variant of attention where a model attends to different positions of a single sequence to compute a representation of the same sequence, central to the Transformer architecture.