Cross-Layer Attention

Concept

A technique that shares KV caches across layers, reducing memory usage and improving performance.

Mentioned in 1 video

Videos Mentioning Cross-Layer Attention

Stanford Online

A technique that shares KV caches across layers, reducing memory usage and improving performance.