Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I drew this as one router.
In reality, you would actually have many copies of the router.
And so you would have as many routers as GPUs, in fact.
As the incoming traffic.
Yeah.
So these are 64 GPUs.
These are 64 GPUs.
It's actually the same GPUs.
We just draw them as separate because they're serving different purposes.
So at this point, any GPU can be sending to any other GPU.
So this all-to-all pattern of communication that shows up and how the Blackwell racks are configured is a perfect fit for the communication pattern that the MOE actually wants to do.
However, if you think maybe I want to do, like maybe one rack is too slow and I want to do two racks,
then i have this challenge that like maybe i've got some sort of rack boundary drawn outside here like this um
And I no longer, in fact, have all-to-all communication between all the GPUs in two racks.
And so the rack-to-rack communication ends up being a substantial bottleneck.
So this sort of, like, the fundamental thing here is that one rack is actually the bounds the size of an expert layer you can do.
And so this has been part of what's been driving towards larger and larger interconnect domains.
Yeah, and this is a place where it starts to be very different, in fact, between NVIDIA, for example, and Google, and then others, including us.
So generally, a rack is a...
It is a physical structure.