FlashAttention (paper)
Study / Research
Stanford paper and implementation fused into PyTorch kernels to speed up attention by avoiding materializing the full attention.
Mentioned in 1 video
Stanford paper and implementation fused into PyTorch kernels to speed up attention by avoiding materializing the full attention.