Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
If you think the tokens are equivalent.
Yeah.
Which, you still get pretty substantial numbers, like, even with your 100 million H100s, and you multiply that by 100, you're starting to, like, get to pretty substantial numbers.
This does mean that those models themselves will be, like, somewhat compute-bonded in many respects.
But these are all, like,
These are relatively short-term changes in timelines of progress, basically.
I think, yes, it's highly likely we get dramatically intranced bottlenecked in 27, 28.
the impulse to that will then be, OK, let's just try and turn out as many possible semiconductors as we can.
There'll be some lag there.
A big part of how fast we can do that will depend on how much people are feeling the AGI in the next two years as they're building out fab capacity.
A lot will depend on how is the Taiwan situation.
Is Taiwan still producing all the fabs, the chips?
Yeah, this is like bimodal distribution.
Yeah.
A conversation I had with Leopold turned into a section in a situation where it's called This Decade or Bust, which is on exactly this topic, which is basically that for the next couple of years, we can dramatically increase our training compute.
And RL is going to be so exciting this year because we can dramatically increase the amount of compute that we apply to it.
And this is also one of the reasons why the gap between, like, say, DeepSeek and O1 was so close at the beginning of the year because they were able to apply the same amount of compute to the RL process.
And so that compute differential actually will be magnified over the course of this year.
Yeah, they're exactly on the sort of cost curve that you'd expect.
Which is not going to take away from the fact that they're like brilliant engineers and like brilliant researchers who are like, I look at their work and I'm like, ah, like the kindred soul there in the work they're doing.