Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There are other kinds of parallelism besides expert parallelism, which we just showed here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In the literature is tensor parallelism.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

With the trend towards smaller experts, this has become much less relevant, so we can ignore that.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But the other two things that we have available are data parallelism and pipeline parallelism.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And they can be a much better fit for using multiple racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's focus on pipeline parallelism specifically.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

this is one layer of MOE.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I'm going to have like 100 more layers up above.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I could decide at this point, for example, to move to a different rack, a change rack.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Now, is that going to become a communication bottleneck?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We can actually just solve for when this becomes a communication bottleneck.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But before we do that algebraically, let's just sort of visualize it out and sketch the path.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we're going to have a bunch, this is another MOE layer, and we're going to have another MOE layer here and so on.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's say I change rack here, and then some number of layers later, I change rack here as well.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So our methodology that we're going to use to determine whether we have a communication bottleneck in this point where we change rack is we're going to compare the... This is the scale-out bandwidth requirements to the scale-up bandwidth requirements.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's try this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I mean, the hint is going to be that...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There's a lot more transcends here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We're sending many things here, whereas we're only sending one thing here.