Page Attention
Concept
A technique introduced in the VLM paper that divides the KV cache into non-contiguous blocks to manage memory fragmentation and improve efficiency.
Mentioned in 1 video
A technique introduced in the VLM paper that divides the KV cache into non-contiguous blocks to manage memory fragmentation and improve efficiency.