FlashAttention

Concept

An efficient kernel for running the attention operation inside the Transformer architecture, developed in academia.

Mentioned in 1 video