Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Let's sort of... So the high level, even in the first place, is...

5618.128 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

there is some amount of increasing cost with context length.

5624.89 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And we can bring that back up.

5629.316 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That was the memory time versus the compute time.

5633.181 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we've put up these same equations from before of the time for memory fetches, which is the weights and the KB cache, and then the time for the compute, which is just the matrix multiplications for the weights.

5637.407 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I will also draw the cost curve.

5652.751 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But this time I'll do it as a function of context length instead of as a function of batch size.

5662.326 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is time over, yeah, just time.

5667.894 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is the cost curve as a function of context length.

5671.72 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We'll draw the compute.

5676.241 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The cost of the compute is actually constant as a function of context length.

5678.265 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's no dependence here on context length.

5680.951 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

In reality, there is some dependence, but it is very mild dependence, so we'll ignore it.

5683.195 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is the time for the compute.

5687.885 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This one.

5696.662 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we'll also draw the dependence of the memory fetch on context length.

5697.571 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And this starts at a large number for the weights and then grows gradually with the context length.

5702.739 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So maybe here, and then grow gradually with context length.

5708.647 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so you take the maximum and you see there is this inflection point here.

5718.822 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is the costs that, for example, Gemini might be paying.

5722.597 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment