Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

ALiBi: Attention with Linear Biases Enables Length Extrapolation

01 Nov 2025

Description

The April 22, 2022 collaboration between University of Washington, Facebook AI and the Allen Institute for AI introduces Attention with Linear Biases (ALiBi), a novel and efficient method for position representation in transformer models that effectively addresses the challenge of **extrapolation**—a model's ability to maintain performance on input sequences longer than those used during training. The authors demonstrate that traditional position encoding methods, like sinusoidal embeddings, fail to extrapolate efficiently, while alternatives like the T5 bias are computationally costly. **ALiBi improves extrapolation** by biasing query-key attention scores with a distance-proportional penalty, eliminating the need for positional embeddings entirely. This approach is shown to be **faster and more memory-efficient** than baselines, enabling a large 1.3 billion parameter model trained on shorter sequences to achieve comparable or superior perplexity scores when evaluated on significantly longer sequences. The findings suggest that ALiBi's performance gains when extrapolating are primarily due to mitigating the "early token curse" common in sequence-splitting evaluation methods.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.