FlashAttention (paper)

Study / Research

Stanford paper and implementation fused into PyTorch kernels to speed up attention by avoiding materializing the full attention.

Mentioned in 1 video