Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And maybe you can learn something from that.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So first, with longer context,

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Gemini 3.1 is 50% more expensive if you go over 200k tokens than if you're below 200k tokens.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I mean, at a high level, I understand why that might be, but why specifically 50%?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

one, six, six, seven.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like about one kilobyte, almost two kilobyte.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Ah, yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's funny that they would leak so much information through their API pricing.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Maybe we can learn something about the difference in input versus output prices.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

and what that tells us about decode versus pre-fill in these models.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And I think, last I checked, it's like 50% more expensive or something like that?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Let's say it's five times more expensive.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is the compute to process the next

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

token in decode, suppose you're doing pre-fill, but you're not just processing the most recent token, you're processing all the tokens in parallel.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I want to say that it would be this times len, len pre-fill?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Lens of the pass in general, yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, yeah, yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So maybe like prefix?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Sure.