w
windowed attention
ConceptMentioned in 1 video
A workaround for long context limitations in transformers that discards information from certain layers, leading to degraded performance on older context.
A workaround for long context limitations in transformers that discards information from certain layers, leading to degraded performance on older context.