Sliding Window Attention

Concept

An attention mechanism that limits the context to the last k tokens, reducing KV cache size and making it suitable for long contexts.

Mentioned in 1 video