Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

ATTENTION2D and lean attention: Distributed Self-Attention

29 Oct 2025

Description

We cover two new innovations from Microsoft extending ideas from the original old **FlashAttention**. Flash Attention is an IO-aware attention algorithm for Transformers designed to address the quadratic time and memory complexity of standard self-attention on long sequences. By using **tiling and recomputation** to minimize slow **High Bandwidth Memory (HBM)** accesses in favor of fast **on-chip SRAM**, FlashAttention achieves significant wall-clock speedups for training models like BERT and GPT-2, enabling them to handle much longer context lengths. Microsoft's new **ATTENTION2D** is a technique that builds upon memory-efficient methods like FlashAttention to optimize **distributed self-attention** across multiple GPUs, achieving parallelism in two dimensions (Q-DIM and KV-DIM) to overcome the communication bottleneck inherent in prior single-dimension parallel approaches like Ring Attention. Microsoft's additional contribution to the research community is **Lean Attention**, which also appears to propose a high-performance, tiled execution strategy for attention, using shared memory and iterative computation, similar to the IO-aware concepts in the other sources.Sources:The original flag attention paper:https://arxiv.org/pdf/2205.14135Flash attention 2 paper:https://arxiv.org/pdf/2307.08691June 28, 2025 Microsoft's Attention2D:https://arxiv.org/pdf/2503.15758Microsoft's Lean attention:https://www.microsoft.com/en-us/research/wp-content/uploads/2024/05/Lean_Attention___arxiv_version.pdf

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.