Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
I never thought about it in terms of how much...
That if every single user who uses... Basically, for GPT to be trained optimally, every single user who uses GPT-5, the total amount of tokens that they stream should equal the total amount that have gone into pre-training.
Yeah.
And the total amount of tokens that have gone into pre-training is the sum of all human knowledge.
So each model should generate...
the sum of human knowledge on the output that it gets on the input.
Right.
And then can we back out how much more compute than chinchilla optimal for a given sized
Somebody told me $150 trillion.
ActiveCrems?
Sorry, I meant tokens.
Oh, I see.
So how much is it over-trained?
That's whatever.
Okay, so if you consider this right here, to the extent this is in the right ballpark,
just by thinking about, okay, you kind of want everything to be equal in terms of compute.
Here's, if that OpenAI also realizes that and they're serving a certain amount of tokens per second, that tells you how much data went into the free training of GPT-5.
Even if it's like 50% off or something, that is sort of wild that you can sort of first principles
These kinds of numbers.
OK, so in the spirit of trying to deduce things, we can publicly look up the prices of the APIs of these models.