Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

More generally, it's called the scale-up network.

2349.12 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is the scale-up network.

2353.185 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

You will typically also have a scale-out network, which allows you to connect to some data center switch.

2356.129 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So data center switch.

2363.217 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

then all of the gpus will have some connectivity up to some data center switch somewhere um but this is this is about times uh like this is the scale out um and it tends to be about about eight times slower uh in bad words so the the challenge if you want to for example lay out a mixture of expert layer across two racks is that

2367.914 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

half of the GPUs here are going to be wanting to talk to the GPUs here.

2396.095 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so just on average, when I look at where the tokens on these GPUs want to go, half of the tokens want to go inside the rack.

2401.06 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That's great.

2409.868 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

They can use the fast scale up network.

2410.428 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But half the tokens are going to want to leave the rack and go to the other rack.

2413.271 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And that's not as good.

2416.554 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

They're going to need to use a much slower network.

2417.394 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so that becomes the bottleneck on the all-to-all pattern.

2419.216 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

A different choice would be, well, why don't I have a big switch here and connect everything to some big switching, like a much bigger switch that actually combines the two racks together.

2424.091 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There are many ideas in this direction, but in general, the reason you have this hierarchy of switches rather than one big switch is to manage the cabling congestion.

2436.686 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

You just need to run a large number of cables.

2446.819 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, exactly.

2453.163 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Why not just have a million chips in scale-up?

2453.765 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

ruben will be i don't know is it 500 something 500 something yeah um what has allowed that to happen uh from hopper to blackwell is is mostly just uh uh the decision to switch from uh uh trays as the form factor one of these is a tray just to switching to racks as the form factor that's a product decision it's um there wasn't a substantial technical barrier there um

2465.705 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

uh switching from uh from the like uh 64 to 500 or so um there's a bit of chintz and math there but uh uh there is at least a genuine forex increase um which is um coming from a much more complicated and difficult rack design and so that is actually like new new physical design to run more cables and the cable complication is just the

2490.41 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment