Multi-Latent Attention

Concept

A DeepSeek innovation that compresses key and value projections to reduce KV cache size, improving efficiency.

Mentioned in 1 video