M
Multi-head attention
ConceptMentioned in 1 video
A component of Transformer models. Mentioned in contrast to more efficient attention schemes like group query and multi-query attention, which aim to reduce KV cache size.
A component of Transformer models. Mentioned in contrast to more efficient attention schemes like group query and multi-query attention, which aim to reduce KV cache size.