Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's a few meters tall, a meter or two wide, depends on configuration.
And it stores some number of GPUs or XPUs, which is typically about 64.
What constrains it being a certain size is power delivery, weight, and cooling ability.
it ends up being about this size in many cases because of these physical constraints.
So then when I deploy a data center, a data center may have thousands of these racks.
So I've got one of these tall racks that's got a bunch of GPUs in it and so on.
And then I put another rack
Next one.
You make it sound so easy.
Yeah, right?
I just drop them in.
In NVIDIA's case, the communication topology is actually, they put the GPUs on the outside of the rack, and then they put these switches on the inside of the rack.
So what this ends up being is that there's a set of switches in here.
These are the NV switches.
and then they run a bunch of cables.
Every single GPU has cables going to the switches in the middle.
So every GPU goes to the switches in the middle, and then the switches have connections to all the GPUs, so all of the GPUs can talk to all the other GPUs in just like two hops, going to the switch, going to the other GPU.
Now, when I want to leave the rack, I end up going via a different path
The GPUs have also a much slower connectivity, which is typically about eight times slower.
So the green that I drew here in GPU cases is the NVLink.