Reiner Pope

In NVIDIA's case, the communication topology is actually, they put the GPUs on the outside of the rack, and then they put these switches on the inside of the rack.

2284.146 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So what this ends up being is that there's a set of switches in here.

2298.562 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

These are the NV switches.

2302.086 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

and then they run a bunch of cables.

2307.248 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Every single GPU has cables going to the switches in the middle.

2308.95 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So every GPU goes to the switches in the middle, and then the switches have connections to all the GPUs, so all of the GPUs can talk to all the other GPUs in just like two hops, going to the switch, going to the other GPU.

2318.762 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now, when I want to leave the rack, I end up going via a different path

2331.398 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The GPUs have also a much slower connectivity, which is typically about eight times slower.

2336.324 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the green that I drew here in GPU cases is the NVLink.

2343.152 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment