Sliding Window Attention
Concept
An attention mechanism that limits the context to the last k tokens, reducing KV cache size and making it suitable for long contexts.
Mentioned in 1 video
An attention mechanism that limits the context to the last k tokens, reducing KV cache size and making it suitable for long contexts.