Mark Zuckerberg
π€ SpeakerAppearances Over Time
Podcast Appearances
So I think any specific thing that I sort of
thought would be valuable, we'd probably be building.
But I think you'll get distilled versions.
I think you'll get smaller versions.
I mean, one thing that I think is
8 billion, I don't think is quite small enough for a bunch of use cases, right?
I think like over time, I'd love to get, you know, a billion parameter model or a 2 billion parameter model, or even like a, I don't know, maybe like a 500 million parameter model and see what you can do with that.
Because I mean, as they start getting...
If with 8 billion parameters, we're basically nearly as powerful as the largest Lama 2 model, then with a billion parameters, we should be able to do something that's interesting, right?
And faster, good for classification or a lot of kind of like basic things that people do before kind of understanding the intent of a user query and feeding it to the most powerful model to kind of hone what the prompt should be.
So I don't know.
I think that's one thing that maybe the community can help fill in.
But I mean, we're also thinking about getting around to distilling some of these ourselves.
But right now the GPUs are training the 405.
That's the whole fleet.
I mean, we built two...
I think it's like 22, 24,000 clusters that are kind of the single clusters that we have for training the big models.
I mean, obviously across a lot of the stuff that we do, a lot of our stuff goes towards training like reels models and like Facebook news feed and Instagram feed.
And then inference is a huge thing for us because we serve a ton of people, right?
So our ratio of inference models