M
MLA (Multi-latent attention)
ConceptMentioned in 1 video
An algorithm from DeepSeek that turns keys and values into a single latent vector, expanded during inference. It offers a way to reduce KV cache size while maintaining richness.
An algorithm from DeepSeek that turns keys and values into a single latent vector, expanded during inference. It offers a way to reduce KV cache size while maintaining richness.