Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
Then like the whole thing gets multiplied together and the whole thing becomes much less likely to happen.
Wait, wait.
So doesn't the fact that there's these companies, Google and, I don't know, Magic, maybe others, who have million token attention imply that the quadratic... You shouldn't say anything.
Doesn't that imply that it's not quadratic anymore or are they just eating the cost?
Okay, so what do you make of this take?
As we move forward through the takeoff, more and more of the learning happens in the forward pass.
So originally, like all the learning happens in the backward, you know, during like this like bottom up sort of hill climbing evolutionary process.
If you think in the limit during the intelligence explosion, it's just like the AI is maybe handwriting the weights or doing go-fi or something.
And we're in the middle step where a lot of learning happens in context now with these models.
A lot of it happens within the backward process.
Does this seem like a meaningful gradient along which progress is happening?
Because the broader thing being, if you're learning in the forward path, it's much more sample efficient because you can basically think as you're learning.
When humans, when you read a textbook, you're not just skimming it and trying to absorb what inductive, these words follow these words.
You read it and you think about it, and then you read some more, you think about it.
I don't know.
Does this seem like a sensible way to think about the progress?
This is actually an interesting point.
So when we talk about scaling up these models, how much of it comes from just making the models themselves bigger?
And how much comes from the fact that during any single call, you are using more compute?
So if you think of diffusion, you can just iteratively keep adding more compute.