Reiner Pope

Reiner Pope – The math behind how LLMs are trained and served

Great.

Reiner Pope – The math behind how LLMs are trained and served

I mean, a little bit to jump to the conclusion, the big effect is batch size, but what we're going to do now is quantify exactly what that looks like and what its implications are on latency and cost.

109.786 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's going to be another effect, which is, you can call it speculative decoding or multi-token prediction.

120.225 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We can maybe come back to that later, but I think the first thing that we'll talk through is batch size.

125.214 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So what I'd like to introduce is sort of the two principles of analysis.

129.648 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Firstly, we're going to look at a roofline analysis of how I run a transformer model on a cluster of chips.

133.756 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We'll take a sort of, let's say, a Blackwell NVL72 cluster, so a rack of 72 GPUs.

139.767 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so the roofline analysis means we look at memory bandwidth and compute performance.

147.502 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then the other side of that is that we're going to look at just two simple factors of the model, which are the time to operate on the weights and then the time to operate on the context, the KB cache.

154.33 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's jump in.

165.622 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

What we're going to try and do is we're going to try and estimate the time that it takes to run an inference of a certain shape.

166.643 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now,