Ajeya Cotra
๐ค SpeakerAppearances Over Time
Podcast Appearances
The six data points fit a power law curve with the amount of computation required to match AlexNet halving every around 16 months over the seven years of the dataset.
They also show that linear programming displayed a similar trend over a longer period of time.
When hardware is held fixed, the time in seconds taken to solve a standard basket of mixed integer problems by SOTA commercial software packages halved every approximately 13 months over the 21 years from 1996 to 2017.
GRACE, 2013, Algorithmic Progress in Six Domains, is the only other paper attempting to systematically quantify algorithmic progress that I'm currently aware of, although I have not done a systematic literature review and may be missing others.
I've chosen not to examine it in detail because a. it was written largely before the deep learning boom and mostly does not focus on ML tasks, and b. it is less straightforward to translate GRACE's results into the format that I am most interested in.
How has the amount of computation required to solve a fixed task decreased over time?
Paul is familiar with the results, and he believes that algorithmic progress across the six domains studied in GRACE 2013 is consistent with a similar but slightly lower rate of progress, ranging from 13 to 36 months, to half the computation required to reach a fixed level of performance.
Additionally, it seems plausible to me that both sets of results would overestimate the pace of algorithmic progress on a transformative task, because they're both focusing on relatively narrow problems with simple, well-defined benchmarks that large groups of researchers could directly optimize.
Because no one has trained a transformative model yet, to the extent that the computation required to train one is falling over time, it would have to happen via proxies rather than researchers directly optimizing that metric.
For example, perhaps architectural innovations that improve training efficiency for image classifiers or language models would translate to a transformative model.
Additionally, it may be that halving the amount of computation required to train a transformative model would require making progress on multiple, partially independent subproblems.
For example, vision and language and motor control.
I have attempted to take the Hernandez and Brown 2020 halving times and Paul's summary of the Grace 2013 halving times as anchoring points and shade them upward to account for the considerations raised above.
There is massive room for judgment in whether and how much to shade upward.
I expect many readers will want to change my assumptions here and some will believe it is more reasonable to shade downward.
Scott writes,
low-hanging fruit already picked, or a hard task, still lots of room to improve.
AlexNet was an easy task, but pushing the frontier of AI is a hard task, so algorithmic progress in frontier AI has been faster than the AlexNet paper estimated.
In Kotra's defense, she admitted that this was the area where she was the least certain, and that she had rounded the progress rate down based on various considerations when other people might round it up based on various other considerations.
But the sheer extent of the error here, compounded with a few smaller errors that unfortunately all shared the same direction, was enough to throw off the estimate entirely.