Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There are other kinds of parallelism besides expert parallelism, which we just showed here.

2892.781 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

In the literature is tensor parallelism.

2898.346 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

With the trend towards smaller experts, this has become much less relevant, so we can ignore that.

2900.269 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But the other two things that we have available are data parallelism and pipeline parallelism.

2907.482 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And they can be a much better fit for using multiple racks.

2912.19 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's focus on pipeline parallelism specifically.

2917.959 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

this is one layer of MOE.

2920.705 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I'm going to have like 100 more layers up above.

2924.51 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I could decide at this point, for example, to move to a different rack, a change rack.

2928.957 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now, is that going to become a communication bottleneck?

2939.652 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So...

2943.096 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We can actually just solve for when this becomes a communication bottleneck.

2944.448 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But before we do that algebraically, let's just sort of visualize it out and sketch the path.

2947.735 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we're going to have a bunch, this is another MOE layer, and we're going to have another MOE layer here and so on.

2951.382 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's say I change rack here, and then some number of layers later, I change rack here as well.

2958.937 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So our methodology that we're going to use to determine whether we have a communication bottleneck in this point where we change rack is we're going to compare the... This is the scale-out bandwidth requirements to the scale-up bandwidth requirements.

2969.002 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's try this.

2992.503 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I mean, the hint is going to be that...

2993.024 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's a lot more transcends here.

2996.727 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We're sending many things here, whereas we're only sending one thing here.

2998.311 View full episode →

← Previous Page 22 of 58 Next →

Report any issue