linear attention

Concept

An early attempt (around 2020) to make attention mechanisms sub-quadratic by removing the softmax nonlinearity, but facing quality and hardware efficiency issues.

Mentioned in 1 video