Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then you think, how might you put a pricing structure on top of that?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

You would like to ensure that no matter what the context length is, you are still profitable.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Interesting.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

so we've got a two-tier pricing structure maybe we've got something that looks like this up to some next context fascinating so i think it says something about um given that the bump is at 200k it probably means that this is somewhat aligned with this crossover point maybe not exactly aligned with fascinating

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we can actually probably even complete that calculation just to see where it lands out.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We can solve for the number of bytes per token if we sort of make some assumptions about the number of active parameters.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

solving for the number of bytes per token um we're going to assume like the point where we equalize um the time of memory and the time of compute is that let's say 200k tokens um so we equalize these two um we're also going to just assume that the batch size is large enough that the um the memory time spent on weights is negligible so we'll forget about this and we'll focus on the actual memory time spent on kbcache so

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That ends up saying copying this term over batch times len context times bytes per token over mem bandwidth is going to be equal to number of activated primes over flops.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then we're going to solve for bytes per token.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

batch size was missing here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Shows up here, and then it cancels out by the time we get to here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And I dropped the len context.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we can plug in numbers.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This number, this is, well, is the reciprocal of the number that we saw before?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, this is like one over 300, which is reasonably stable across many different hardware platforms.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We conjecturally said that maybe number of activated tokens is like 100 billion.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And length of the context we said was 200k.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Something is wrong here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The length of the context should be on the denominator, not the numerator.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That is plausible actually.