Ajeya Cotra
๐ค SpeakerAppearances Over Time
Podcast Appearances
Algorithmic progress.
How effectively do researchers discover new algorithms that makes training AIs cheaper and more efficient?
Total effective compute.
The product of real compute and algorithmic progress.
So, for example, the epoch columns 10.7 times means that in any given year, you can train an AI 10.7 times better than the last year, because you have 3.6 times more compute available, and that compute is 3 times more efficient.
Kotra and Davidson were pretty close on willingness to spend and on flops per dollar.
This is an impressive achievement.
They more or less predicted the giant data center build-out of the past few years.
They ignored training run length, which probably seemed like a reasonable simplification at the time, but they got killed on algorithmic progress, which was 200% per year instead of 30%.
How did they get this one so wrong?
Here's Kotra's section on algorithmic progress.
Algorithmic progress forecasts.
Note, I've done very little research into algorithmic progress trends.
Of the four main components of my model, 2020 compute requirements, algorithmic progress, compute price trends, and spending on computation, I've spent the least time thinking about algorithmic progress.
I consider two types of algorithmic progress, relatively incremental and steady progress from iteratively improving architectures and learning algorithms, and the chance of breakthrough progress, in quotes, which brings the technical difficulty of training a transformative model down from astronomically large or impossible to broadly feasible.
For incremental progress, the main source I used was Hernandez and Brown, 2020, measuring the algorithmic efficiency of neural networks.
The authors re-implemented open-source state-of-the-art, or SOTA, ImageNet models between 2012 and 2019, six models in total.
They trained each model up to the point that it achieved the same performance as AlexNet achieved in 2012, and recorded the total flop that required.
They found that the SOTA model in 2019, EfficientNet B0, required around 44 times fewer training flop to achieve AlexNet performance than AlexNet did.