F
FlashAttention (paper)
Study / ResearchMentioned in 1 video
Stanford paper and implementation fused into PyTorch kernels to speed up attention by avoiding materializing the full attention.
Stanford paper and implementation fused into PyTorch kernels to speed up attention by avoiding materializing the full attention.