DeepSeek Sparse Attention

Concept

A method that selects a subset of KV cache tokens to keep, using lighter-weight queries to determine which tokens are important.

Mentioned in 1 video