Page Attention

Concept

A technique introduced in the VLM paper that divides the KV cache into non-contiguous blocks to manage memory fragmentation and improve efficiency.

Mentioned in 1 video