Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
So four to four turbo was like, the model was less than half the size.
And four turbo to four O, like four O's cost is way lower than four.
And they just kept shrinking the cost.
Now five, what could they have done?
They could have gone, oh, we'll go big step.
They actually tried that with 4.5.
They screwed up some things because it was really hard to get, you know, 100,000 GPUs to work properly.
There's challenges there.
Also, they hadn't figured out the whole reinforcement learning paradigm at that time.
The scaling laws are like, it's a chart of quality versus compute, but that compute breaks down into how much bigger do I make the model?
How much more data do I put in the model?
And if the internet only has so many tokens, you're kind of screwed.
There was potentially a cliff until reinforcement learning happened where you can generate data and train the model to be better without the internet having that data.
So they kind of had this problem of,
You have X amount of compute.
You can service your users.
But hey, today, if people want to use my API, I rate limit them because I can't actually serve them all.
I have to rate limit the people who have ChatGPT free, pro, and max, whatever the $2,200 deal is.
There's like different rate limits.
You can only do deep research so much.