MLA (Multi-latent attention)

Concept

An algorithm from DeepSeek that turns keys and values into a single latent vector, expanded during inference. It offers a way to reduce KV cache size while maintaining richness.

Mentioned in 1 video

Videos Mentioning MLA (Multi-latent attention)

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447

Lex Fridman

An algorithm from DeepSeek that turns keys and values into a single latent vector, expanded during inference. It offers a way to reduce KV cache size while maintaining richness.