linear attention

ConceptMentioned in 1 video

An early attempt (around 2020) to make attention mechanisms sub-quadratic by removing the softmax nonlinearity, but facing quality and hardware efficiency issues.