Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Let's sort of... So the high level, even in the first place, is...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

there is some amount of increasing cost with context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And we can bring that back up.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That was the memory time versus the compute time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we've put up these same equations from before of the time for memory fetches, which is the weights and the KB cache, and then the time for the compute, which is just the matrix multiplications for the weights.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I will also draw the cost curve.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But this time I'll do it as a function of context length instead of as a function of batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this is time over, yeah, just time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this is the cost curve as a function of context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We'll draw the compute.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The cost of the compute is actually constant as a function of context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There's no dependence here on context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In reality, there is some dependence, but it is very mild dependence, so we'll ignore it.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this is the time for the compute.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This one.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then we'll also draw the dependence of the memory fetch on context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And this starts at a large number for the weights and then grows gradually with the context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So maybe here, and then grow gradually with context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so you take the maximum and you see there is this inflection point here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this is the costs that, for example, Gemini might be paying.