flash attention

Software / App

An optimized implementation of transformer attention, which Manifest AI's Vidril framework can match or outperform, especially on non-standard problem shapes.

Mentioned in 8 videos

What podcasters actually say about flash attention.

8 mentions, no marketing. Save them all to a pod and ask any question.

Get Started Free

Videos Mentioning flash attention

Answer.ai & AI Magic with Jeremy Howard

Answer.ai & AI Magic with Jeremy Howard

Latent Space

An optimized attention mechanism for Transformers, its compatibility issues with newer versions of Transformers were discussed.

Building an open AI company - with Ce and Vipul of Together AI

Building an open AI company - with Ce and Vipul of Together AI

Latent Space

An optimization technique for attention mechanisms in transformers, open-sourced and contributing to better AI models.

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Latent Space

Flash Attention is highlighted as an algorithmic trick that improves efficiency without increasing compute, similar to Hotz's approach with Tiny Grad.

⚡️ Beyond Transformers with Power Retention

⚡️ Beyond Transformers with Power Retention

Latent Space

An optimized implementation of transformer attention, which Manifest AI's Vidril framework can match or outperform, especially on non-standard problem shapes.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs

Stanford Online

An optimized algorithm for computing attention mechanisms in neural networks, designed for efficiency and reduced memory usage on GPUs.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA

Stanford Online

Mentioned as a potential application or goal that the lecture's concepts will enable students to implement.

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford Online

An optimized attention mechanism that inspired the online softmax approach used in Ring Attention.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Guest Lecture: Dan Fu

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Guest Lecture: Dan Fu

Stanford Online

An optimized attention mechanism mentioned as something students in the class might implement for training.