Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's a few meters tall, a meter or two wide, depends on configuration.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And it stores some number of GPUs or XPUs, which is typically about 64.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

What constrains it being a certain size is power delivery, weight, and cooling ability.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

it ends up being about this size in many cases because of these physical constraints.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So then when I deploy a data center, a data center may have thousands of these racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I've got one of these tall racks that's got a bunch of GPUs in it and so on.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then I put another rack

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Next one.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

You make it sound so easy.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, right?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I just drop them in.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In NVIDIA's case, the communication topology is actually, they put the GPUs on the outside of the rack, and then they put these switches on the inside of the rack.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So what this ends up being is that there's a set of switches in here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

These are the NV switches.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

and then they run a bunch of cables.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Every single GPU has cables going to the switches in the middle.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So every GPU goes to the switches in the middle, and then the switches have connections to all the GPUs, so all of the GPUs can talk to all the other GPUs in just like two hops, going to the switch, going to the other GPU.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Now, when I want to leave the rack, I end up going via a different path

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The GPUs have also a much slower connectivity, which is typically about eight times slower.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So the green that I drew here in GPU cases is the NVLink.