Ege Erdil
๐ค SpeakerAppearances Over Time
Podcast Appearances
And overall, you can summarize those estimates as thinking about the kind of returns to research effort.
And, you know, we've looked into the returns to research effort in software specifically.
And we look at a bunch of domains in traditional software or, you know, like linear integer solvers or SAT solvers, but also in AI, like computer vision and RL and language modeling.
And there...
Like if this model is true that all you need is just cognitive effort, it seems like the estimates are a bit ambiguous about whether this results in this acceleration or whether it results in just merely exponential growth.
And then you might also think about, well, it isn't just your research effort that you have to scale up to make these innovations because you might have โ
complementary input.
So as you mentioned, experiments are the thing that might kind of bottleneck you.
And I think there's a lot of evidence that, in fact, these experiments and scaling up hardware is just very important for getting progress in the algorithms and the architecture and so on.
So in AI, this is true for software in general, where if you look at progress in software, it often matches very closely the rate of progress we see in hardware.
So for traditional software, we see about a 30% roughly increase per year, which kind of basically matches Moore's Law.
And in AI, we've seen the same until you get to the deep learning era, and then you get this acceleration, which in fact coincides with the acceleration we see in compute scaling, which gives you a hint that actually the compute scaling might have been very important.
Other pieces of evidence, besides this coincidental rate of progress,
Other kind of pieces of evidence are the fact that innovation and algorithms and architectures are often concentrated in GPU-rich labs and not in the GPU-poor parts of the world like academia or maybe smaller research institutes.
That also suggests that having a lot of hardware is very important.
If you look at specific innovations that seem very important, the big innovations over the past five years, many of them have some โ
kind of scaling or hardware related motivation.
So you might look at the transformer itself was about how to harness more parallel compute.
Things like flash attention was literally about how to implement the attention mechanism more efficiently.
or things like the Chinchilla scaling law.