We cover two new innovations from Microsoft extending ideas from the original old **FlashAttention**. Flash Attention is an IO-aware attention algorithm for Transformers designed to address the quadratic time and memory complexity of standard self-attention on long sequences. By using **tiling and recomputation** to minimize slow **High Bandwidth Memory (HBM)** accesses in favor of fast **on-chip SRAM**, FlashAttention achieves significant wall-clock speedups for training models like BERT and GPT-2, enabling them to handle much longer context lengths. Microsoft's new **ATTENTION2D** is a technique that builds upon memory-efficient methods like FlashAttention to optimize **distributed self-attention** across multiple GPUs, achieving parallelism in two dimensions (Q-DIM and KV-DIM) to overcome the communication bottleneck inherent in prior single-dimension parallel approaches like Ring Attention. Microsoft's additional contribution to the research community is **Lean Attention**, which also appears to propose a high-performance, tiled execution strategy for attention, using shared memory and iterative computation, similar to the IO-aware concepts in the other sources.Sources:The original flag attention paper:https://arxiv.org/pdf/2205.14135Flash attention 2 paper:https://arxiv.org/pdf/2307.08691June 28, 2025 Microsoft's Attention2D:https://arxiv.org/pdf/2503.15758Microsoft's Lean attention:https://www.microsoft.com/en-us/research/wp-content/uploads/2024/05/Lean_Attention___arxiv_version.pdf
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
Eric Larsen on the emergence and potential of AI in healthcare
10 Dec 2025
McKinsey on Healthcare
Reducing Burnout and Boosting Revenue in ASCs
10 Dec 2025
Becker’s Healthcare -- Spine and Orthopedic Podcast
Dr. Erich G. Anderer, Chief of the Division of Neurosurgery and Surgical Director of Perioperative Services at NYU Langone Hospital–Brooklyn
09 Dec 2025
Becker’s Healthcare -- Spine and Orthopedic Podcast
Dr. Nolan Wessell, Assistant Professor and Well-being Co-Director, Department of Orthopedic Surgery, Division of Spine Surgery, University of Colorado School of Medicine
08 Dec 2025
Becker’s Healthcare -- Spine and Orthopedic Podcast
NPR News: 12-08-2025 2AM EST
08 Dec 2025
NPR News Now
NPR News: 12-08-2025 1AM EST
08 Dec 2025
NPR News Now