Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we're also maybe doing it many times.

3001.377 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So that's going to be what makes the difference.

3003.843 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, so number of activated GPUs, right?

3027.116 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, like, I don't send to this GPU at all, right?

3030.019 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So there's an explosion from one to, like, three times larger here in this diagram.

3032.902 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

3038.267 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The key thing is that I didn't even need to send to this GPU at all, and so that's a big saving.

3039.568 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I see, yeah.

3043.272 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, so we're gonna talk through sort of how much more, what is the slowdown of, to what extent is scale up a bottleneck over scale out?

3044.922 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we will directly jump to the ratio of the time spent on scale up, time on scale up, over the time spent on scale out.

3056.973 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is the quantity we're talking about.

3073.897 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And the first consideration is that the scale up is eight times faster than scale out generally.

3076.442 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so at a baseline, if the bandwidths were the same, we would have this one over eight, which is coming from bandwidths.

3086.663 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But then we have some amount of expansion in how much data we're sending.

3097.522 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So if one token comes in here, then this one token gets routed to, in the DeepSeq case, it'll get routed to maybe 32 experts or 16 experts, gets routed to some number of experts.

3103.689 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is the number of activated experts, number of activated experts.

3116.644 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then it also, the same thing applies on multiple different layers.

3128.082 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So maybe I'm going to run two layers.

3135.391 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So there's also multiple times number of layers per stage.

3137.294 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yes, yes.

3152.693 View full episode →

← Previous Page 23 of 58 Next →

Report any issue