Ege Erdil
๐ค PersonAppearances Over Time
Podcast Appearances
But yeah, so that suggests that we might need a lot more compute scaling to get these additional capabilities to be unlocked.
And then there's the question of do we really have that in us as an economy to be able to sustain that scaling?
I think some people, I mean, if it's literally just book flight job and without, you know.
I mean, I guess you could have made similar points five years ago and say, you know, you look at AlphaZero and there's this mini AGI there.
And if only you unhobbled it by training it on text and giving it all your context and so on, like that just wouldn't really have worked.
I think you do really need to rethink how you train these models in order to get these capabilities.
So when you say reasoning is easy and, you know, it only took this much compute and it wasn't very much and maybe you look at the sheer number of tokens and it wasn't very much and so it looks easy, well, that's kind of true from our position today.
But I think if you ask someone, build a reasoning model in 2015, then it would have looked insurmountable.
You would have had to train a model on tens of thousands of GPUs.
You would have had to solve, you know, that...
problem and each order of magnitude of scaling from where they were would pose new challenges that they would need to solve.
You would need to produce
kind of internet scale or tens of trillions of tokens of data in order to actually train a model that kind of has the knowledge that you can then unlock and access by way of training it to be a reasoning model.
You need to maybe make the model more efficient at kind of doing inference and maybe distill it because if it's very slow, then you have a reasoning model that's not particularly useful.
So you also need to make various innovations to
you know, get the model to be distilled so that you can train it more quickly because these rollouts take very long.
It actually becomes a product that's valuable if it's a couple tokens a second as a reasoning model that would have been very difficult to work with.
So in some sense, it looks easy from our point of view, standing on this huge stack of technology that we've built up over the past five years or so.
But at the time, it would have been very hard.
And so my claim would be something like, I think the agency part might be easy in a similar sense,