Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I drew this as one router.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In reality, you would actually have many copies of the router.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so you would have as many routers as GPUs, in fact.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

As the incoming traffic.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So these are 64 GPUs.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

These are 64 GPUs.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's actually the same GPUs.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We just draw them as separate because they're serving different purposes.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So at this point, any GPU can be sending to any other GPU.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this all-to-all pattern of communication that shows up and how the Blackwell racks are configured is a perfect fit for the communication pattern that the MOE actually wants to do.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

However, if you think maybe I want to do, like maybe one rack is too slow and I want to do two racks,

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

then i have this challenge that like maybe i've got some sort of rack boundary drawn outside here like this um

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And I no longer, in fact, have all-to-all communication between all the GPUs in two racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so the rack-to-rack communication ends up being a substantial bottleneck.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this sort of, like, the fundamental thing here is that one rack is actually the bounds the size of an expert layer you can do.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so this has been part of what's been driving towards larger and larger interconnect domains.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, and this is a place where it starts to be very different, in fact, between NVIDIA, for example, and Google, and then others, including us.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So generally, a rack is a...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It is a physical structure.