DeepSeek Sparse Attention
Concept
A method that selects a subset of KV cache tokens to keep, using lighter-weight queries to determine which tokens are important.
Mentioned in 1 video
A method that selects a subset of KV cache tokens to keep, using lighter-weight queries to determine which tokens are important.