M

MLA (Multi-latent attention)

ConceptMentioned in 1 video

An algorithm from DeepSeek that turns keys and values into a single latent vector, expanded during inference. It offers a way to reduce KV cache size while maintaining richness.