Multi-Latent Attention

Concept

A DeepSeek innovation that compresses key and value projections to reduce KV cache size, improving efficiency.

Mentioned in 1 video

Videos Mentioning Multi-Latent Attention

Stanford Online

A DeepSeek innovation that compresses key and value projections to reduce KV cache size, improving efficiency.