Attention Free Transformer

Book

A paper written by Apple researchers, which blink, the creator of RWKV, adapted to scale up language models without traditional attention mechanisms.

Mentioned in 1 video