Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
That's a tremendous like question.
I'd probably still say capacity slash cost is more important than latency.
I think existing levels of latency are fast enough for a lot.
If the latency was 10x lower for GPT-5, then they could have made a model that was 10x bigger and served it at this speed, but then you would have the same capacity issue.
If you could have your cake and eat it, which is all the capacity in the world and the lowest latency in the world, you'd make the models way better.
It's the physical realities of, if I'm at OpenAI, what do I choose to do?
Do I invest more in the model that people can use and that's fast?
Do I invest a lot in the model that most people won't use because it's expensive, first of all, and even those that can afford it will often go back to the regular one?
I have access to Cloud 4.1 Opus.
I still use Sonnet way more.
Right.
It's slow.
My time's worth something.
If you had the magic button, I think opening, I wouldn't have been afraid to like make a model way, way, way bigger and a terrible user experience.
It's not necessarily even bigger, right?
Like there's this whole concept of over-parameterization, i.e.
if you just throw more parameters in a neural network, I'll equate it to humans, right?
When you had a vocab test or you had some test, you memorized before you understood.
And it wasn't until you did multiple repetitions and in different forms that you actually understood the content rather than just memorized.
It takes cycles.