Cross-Layer Attention

Concept

A technique that shares KV caches across layers, reducing memory usage and improving performance.

Mentioned in 1 video