FlashAttention
Concept
An efficient kernel for running the attention operation inside the Transformer architecture, developed in academia.
Mentioned in 1 video
An efficient kernel for running the attention operation inside the Transformer architecture, developed in academia.