F

FlashAttention (paper)

Study / ResearchMentioned in 1 video

Stanford paper and implementation fused into PyTorch kernels to speed up attention by avoiding materializing the full attention.