Ring Attention

Software / AppMentioned in 1 video

A technique employed by Gradient for training their long context models, improving GPU utilization.