flash attention
An optimized implementation of transformer attention, which Manifest AI's Vidril framework can match or outperform, especially on non-standard problem shapes.
Videos Mentioning flash attention

Answer.ai & AI Magic with Jeremy Howard
Latent Space
An optimized attention mechanism for Transformers, its compatibility issues with newer versions of Transformers were discussed.

Building an open AI company - with Ce and Vipul of Together AI
Latent Space
An optimization technique for attention mechanisms in transformers, open-sourced and contributing to better AI models.

Ep 18: Petaflops to the People — with George Hotz of tinycorp
Latent Space
Flash Attention is highlighted as an algorithmic trick that improves efficiency without increasing compute, similar to Hotz's approach with Tiny Grad.

⚡️ Beyond Transformers with Power Retention
Latent Space
An optimized implementation of transformer attention, which Manifest AI's Vidril framework can match or outperform, especially on non-standard problem shapes.