Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then you think, how might you put a pricing structure on top of that?

5727.449 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

You would like to ensure that no matter what the context length is, you are still profitable.

5732.642 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Interesting.

5738.517 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

so we've got a two-tier pricing structure maybe we've got something that looks like this up to some next context fascinating so i think it says something about um given that the bump is at 200k it probably means that this is somewhat aligned with this crossover point maybe not exactly aligned with fascinating

5739.758 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we can actually probably even complete that calculation just to see where it lands out.

5756.887 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We can solve for the number of bytes per token if we sort of make some assumptions about the number of active parameters.

5763.346 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

solving for the number of bytes per token um we're going to assume like the point where we equalize um the time of memory and the time of compute is that let's say 200k tokens um so we equalize these two um we're also going to just assume that the batch size is large enough that the um the memory time spent on weights is negligible so we'll forget about this and we'll focus on the actual memory time spent on kbcache so

5771.062 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That ends up saying copying this term over batch times len context times bytes per token over mem bandwidth is going to be equal to number of activated primes over flops.

5799.091 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we're going to solve for bytes per token.

5823.604 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

batch size was missing here.

5847.775 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Shows up here, and then it cancels out by the time we get to here.