Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
keep write everything you want to it or take everything out of it.
Or we don't want to be in a situation where our ability to write back and forth is so big, or sorry, so small compared.
Yeah, makes sense.
Makes a ton of sense.
Okay, so a couple of actually quick questions.
One, if it is the case that the optimal batch size is something like 2000, and that actually true, it's totally dependent on sparsity.
It's not dependent on the model size or anything.
But that's a very interesting result.
And that seems to imply that you can...
One question is, how much of a push towards centralization is it that you would have these economies of scale from inference, from batching?
But it seems like it's not that big a deal.
Like, I don't know, is 2,000 users at the same time a lot?
It doesn't seem like a lot?
But I mean, Gemini is big.
That's actually one thousandth of Gemini is a lot.
To actually be like...
To be competitive at scale, you need to be able to serve at least 1,000 Go Gemini?
Yeah.
That's interesting.
Cool.