Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
By the end of this year, roughly a couple of gigawatts of capacity too.
How do they maximize their serving capacity with this?
One avenue is we continue to serve big models and we make bigger models and the tokens are more expensive, but this log log scale is really challenging because yes, the value is way more, but the cost is way more.
And then the real whammy is the user experience is way worse.
If I serve a massive, massive model, it's slow and users are fickle.
You need the response to be faster than they can read.
Hard to calibrate, yeah.
Yeah.
So there's this like user experience challenge, but really in the end, it's like for a given model level, I think there's a saturation point of how much demand of intelligence there is.
You can only have such large child army, right?
Of like people digging trenches or like Kony 2012, whatever it is, like this is very cancelable, but you know.
But you can have a much larger army of, or business of like the larger level of intelligence.
When you think about what could I have done with GPT-3?
GPT-3, even if we paused there, pause the model capabilities, right?
You know, obviously the cost to serve a model quality of GPT-3, that's like 2000 times cheaper now.
And then GPT-4, same thing, right?
People were freaking out about DeepSeek because it was like five, 600 times cheaper.
GPT-OSS came out and that's even cheaper than that.
for roughly the same quality.
Actually, I would argue the GPT open source model is actually a little bit better than GPT-4-OG because it can do a tool calling.