FlashAttention

Software / App

Mentioned as an example of specialized kernels for Transformers, paralleled by efforts like FlashFFTConv for SSMs.

Mentioned in 5 videos

Save the 5 videos on FlashAttention to your own pod.

Get Started Free

Videos Mentioning FlashAttention

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

Latent Space

Mentioned as an example of specialized kernels for Transformers, paralleled by efforts like FlashFFTConv for SSMs.

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Latent Space

An optimized implementation of the attention mechanism that improves memory efficiency.

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

Latent Space

An optimized implementation of the attention mechanism for transformers, which llm.c utilizes for improved performance.

The End of Finetuning — with Jeremy Howard of Fast.ai

Latent Space

An optimization technique for attention mechanisms, mentioned as an example of innovations that could be made easier with better languages like Mojo.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 4: Attention Alternatives

Stanford Online

A systems-level optimization for attention mechanisms that rearranges operations to minimize memory transfer overhead, leading to significant performance improvements.