Mark Zuckerberg
π€ SpeakerAppearances Over Time
Podcast Appearances
compute required to training is probably much higher than most other companies that are doing this stuff just because of the sheer volume of the community that we're serving.
Yeah, yeah.
Although, and one of the interesting things about it that we saw even with the $70 billion is we thought it would get more saturated at, you know, it's like we trained it on around $15 trillion tokens.
I guess our prediction going in was that it was going to asymptote more.
But even by the end, it was...
still learning right it's like we probably could have fed it more tokens and it would have gotten somewhat better but i mean at some point you know you're running a company you need to do these meta reasoning questions of like all right how do i want to spend our gpus on like training this 70 billion model further do we want to kind of get on with it so we can start testing hypotheses for llama 4 so we kind of needed to to make um to make that call and i think we got it we i think we
There will be others in the future where, you know, the 70 billion multimodal one that'll come over the next period.
But yeah, I mean, that was fascinating that the architectures at this point can just take so much data.
Okay.
But does that mean like the LAMA-4?
In the same order of magnitude.
This is one of the great questions that I think no one knows.
It's one of the trickiest things in the world to plan around is when you have an exponential curve, how long does it keep going for?
Yeah.
I think it's likely enough that it will keep going, that it is worth investing the tens or 100 billion plus in building the infrastructure to assume that if that kind of keeps going, you're going to get some really amazing things that are just going to make amazing products.
But...
I don't think anyone in the industry can really tell you that it will continue scaling at that rate for sure.
In general, in history, you hit bottlenecks at certain points.
And now there's so much energy on this that maybe those bottlenecks get knocked over pretty quickly.
But I don't know.