Ege Erdil
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so many of these big innovations were just about how to harness your compute more effectively.
That also tells you that actually the scaling of compute might be very important.
And I think there's just like many pieces of evidence that points towards this complementarity picture.
So I would say that not only, like even if you assume that experiments are not particularly important,
The evidence we have, both from estimates of AI and other software, although the data is not great, suggests that maybe you don't get this hyperbolic, faster-than-exponential super growth in the overall algorithmic efficiency of systems.
I mean AI researchers will often kind of overstate the extent to which just cognitive effort and doing research is important for driving these innovations because that's often โ
kind of convenient or useful they will say the insight was you know was derived from some kind of nice idea about statistical mechanics or some nice equation in physics that says that we should do it this way and then
And then โ but often that's kind of an ad hoc story that they tell to make it a bit more compelling to the kind of reviewers.
Quite high.
Then you're conditioning on the compute not being very large.
So it must be that you get a bunch of software progress.
I think a call out that I want to make is I know that some labs do have multiple pre-training teams and they give people different amounts of resources for doing the training and different amounts of cognitive effort, different size of teams.
But none of that I think has been published and I would love to see the results of some of those experiments.
I think even that won't update you very strongly just because it is often just very inefficient to do this very imbalanced scaling of your factor inputs.
And in order to really get an estimate of how strong these complementarities are, you need to
observe these very imbalanced scale-ups.
And so that rarely happens.
And so I think the data that bears on this is just really quite poor.
And then the intuitions that people have also don't seem clearly relevant to the thing that matters about what happens if you do this very imbalanced scaling, and where does this net out?
How did you find them?