Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
So, you know, like we didn't get to that.
I think we're early on non-text.
Yeah, we're quite early on.
And then the other angle is just because you've used the text doesn't mean you can't learn faster.
You know, it's like take a classroom again.
Like, you know, this is like Machiavellian.
I don't even know if Machiavellian is the right word, but like you take a class, you give them all a book, you tell them to read it once and you test them all.
It's like one kid's going to get 100 and one kid's going to get a 40, right?
It's just the reality of life.
And if you read the book out loud to them, the kid who got 100 might get a 30 and the kid who got a 40 might have got a 60, right?
So there's like these different parameters.
And when we talk about model architecture, the same thing happens there.
So it's not like you stop training new models.
It's not like you don't have algorithmic improvements or smarter kids, right?
You know?
So it's not like pre-training is done.
It's the base of everything.
So you want to keep having gains because any gains on pre-training, i.e.
the model learns a little faster or the model's a little bit smaller for the same quality, feeds into the next stage, which is this whole post-training side, which will subsume the majority of the compute at some point.
And inning-wise, are we in the second inning of that?