Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

arxiv preprint - Transformers need glasses! Information over-squashing in language tasks

17 Jun 2024

Description

In this episode, we discuss Transformers need glasses! Information over-squashing in language tasks by Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković. The paper explores how information propagates in decoder-only Transformers, revealing a phenomenon where different input sequences can result in nearly identical final token representations. This issue, worsened by low-precision floating-point formats, impairs the model’s ability to distinguish between these sequences, leading to errors in specific tasks. The authors provide theoretical and empirical evidence of this problem and suggest simple solutions to mitigate it.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.