M

Multi-head attention

ConceptMentioned in 1 video

A component of Transformer models. Mentioned in contrast to more efficient attention schemes like group query and multi-query attention, which aim to reduce KV cache size.