Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Great.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I mean, a little bit to jump to the conclusion, the big effect is batch size, but what we're going to do now is quantify exactly what that looks like and what its implications are on latency and cost.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There's going to be another effect, which is, you can call it speculative decoding or multi-token prediction.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We can maybe come back to that later, but I think the first thing that we'll talk through is batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So what I'd like to introduce is sort of the two principles of analysis.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Firstly, we're going to look at a roofline analysis of how I run a transformer model on a cluster of chips.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We'll take a sort of, let's say, a Blackwell NVL72 cluster, so a rack of 72 GPUs.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so the roofline analysis means we look at memory bandwidth and compute performance.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then the other side of that is that we're going to look at just two simple factors of the model, which are the time to operate on the weights and then the time to operate on the context, the KB cache.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's jump in.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

What we're going to try and do is we're going to try and estimate the time that it takes to run an inference of a certain shape.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Now,

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We're not perfect here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We can't exactly predict the time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so instead, we're going to approximate.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so we're going to say that the time must be greater than or equal to a certain quantity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so we're going to consider two different aspects.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We're going to look at the time it takes to do the memory fetches and then the time it takes to do the compute.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And it'll turn out that this actually gives us a very strong predictive power, even with a simple one.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So one by one, what is the time that it takes to do the compute?

โ† Previous Page 1 of 58 Next โ†’