Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

3872.187 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we'll start off with just memory capacity without even thinking about parallelism scheme.

3873.048 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so the capacity of memory or the demand on memory is the number of total parameters.

3878.453 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is what we need to fit the weights in some system that we are using.

3891.963 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we need to fit the KVs as well.

3897.152 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So KVs go as batch size times the length of the context times the bytes per check.

3900.258 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, so...

3913.923 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

What I was arguing about in this context and the case I was making for pipelining is that we will actually, there are some techniques that allow us to solve this, other techniques that allow us to solve this.

3917.174 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's consider, so we're going to run this on some number of GPUs and we're going to say, we're going to have one extended, which is E is going to be the expert parallelism.

3927.613 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So how many, when we had this charting of expert layer across many GPUs, how much of that, to what extent do we do that?

3941.356 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

How many GPUs?

3949.108 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we're going to say that this is, for example, 64.

3952.193 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then P is going to be the extent of pipeline, pipelining.

3955.898 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so this is the number of racks, which who knows, maybe we'll pick four or something like that.

3962.228 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

What we want to calculate, so this is like the total memory requirement across the system.

3968.862 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But now I'm going to calculate a memory requirement per GPU.

3977.437 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So per GPU memory requirement, we're going to have, I guess I'll use a lowercase c, mem.

3983.327 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And well, obviously we just take all of these numbers and divide it by E and P, really easy.

3995.356 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So it's this end total plus the batch times length of context times bytes of toke.

3999.142 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

All of this is divided by E and P.

4013.525 View full episode →

← Previous Page 29 of 58 Next →

Report any issue