Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

Adafactor: Memory-Efficient Adaptive Learning Rates

27 Aug 2025

Description

This April 2018 paper introduces Adafactor, a novel optimization method designed to reduce the memory footprint of adaptive learning rate algorithms like Adam, particularly for large neural networks. Adafactor achieves this by estimating per-parameter second moments using factored representations, specifically maintaining only row and column sums for weight matrices, thereby reducing memory requirements from O(nm) to O(n+m). The paper also addresses training instability in adaptive methods, proposing update clipping and a gradually increasing decay rate scheme for the second-moment accumulator as solutions. Furthermore, Adafactor suggests scaling parameter updates based on the parameters' own magnitudes rather than absolute step sizes, contributing to its overall efficiency and stability. Experimental results on the Transformer model for machine translation demonstrate that Adafactor achieves comparable performance to Adam while requiring significantly less auxiliary memory.Source:https://arxiv.org/pdf/1804.04235

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.