Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
And maybe you can learn something from that.
So first, with longer context,
Gemini 3.1 is 50% more expensive if you go over 200k tokens than if you're below 200k tokens.
I mean, at a high level, I understand why that might be, but why specifically 50%?
one, six, six, seven.
Like about one kilobyte, almost two kilobyte.
Ah, yeah.
It's funny that they would leak so much information through their API pricing.
Maybe we can learn something about the difference in input versus output prices.
and what that tells us about decode versus pre-fill in these models.
And I think, last I checked, it's like 50% more expensive or something like that?
Let's say it's five times more expensive.
Okay.
This is the compute to process the next
token in decode, suppose you're doing pre-fill, but you're not just processing the most recent token, you're processing all the tokens in parallel.
So I want to say that it would be this times len, len pre-fill?
Lens of the pass in general, yeah.
Okay, yeah, yeah.
So maybe like prefix?
Sure.