Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
I mean, I think starting with equalizing in cost is right, but depending on how you model the cost, this comes close to equalizing in data.
Which way are people going to err?
If you think that people's power of prediction is not perfect, and also you run the risk that you make a model that is not a frontier model, and then you just throw it away, then that changes the cost trade-off because there's some probability that applies to the inference, and you should derate the inference tokens by some amount.
Yeah, so I think we just have to make some real-world assumptions here in order to do that.
So the inference tokens we should totally be able to catch, right?
So let's say a few hundred million, I don't know, maybe it's like 500 million tokens a second now, I don't really know.
500 million tokens a second times a model is deployed for two months before it becomes obsolete?
I don't really know.
I can't do this in my head.
Can you type it into a computer?
uh 2.6 times 10 to the 15th okay 2.6 uh 10 to the 15th okay um um this number is probably too large this um because this is going to be multiple models in a family we so let's let's make it like five times smaller or ten times smaller or something like that um uh
Okay, so we're estimating maybe 50 million tokens per second per specific model.
The model is live for two months, and so this comes out to around 200 trillion tokens.
And then we want to compare that to active parameters on a frontier model.
I don't actually know the latest rumors, but some
Do you know who for?
Trained on $150 trillion tokens.
Interesting.
Which is similar.
Yeah, that's actually similar.