windowed attention

Concept

A workaround for long context limitations in transformers that discards information from certain layers, leading to degraded performance on older context.

Mentioned in 1 video

Videos Mentioning windowed attention

⚡️ Beyond Transformers with Power Retention

Latent Space

A workaround for long context limitations in transformers that discards information from certain layers, leading to degraded performance on older context.