Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We have a divisibility problem.

2057.029 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is not a power of two.

2058.752 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we'll just simplify and say we're only going to use 64 of them.

2060.696 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Just ignore the other eight.

2065.866 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's not a big deal.

2066.727 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so we have four experts per GPU.

2067.91 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Very simple.

2070.575 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

For the sake of the diagram, I'll actually just say, let's say we have two experts per GPU.

2072.804 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we end up just putting, these are the GPU boundaries.

2077.694 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Every pair of experts is on its own GPU.

2084.046 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we can look at the communication cost.

2088.481 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We had some tokens stored centrally here.

2090.343 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

They get routed to all of these experts.

2093.708 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so there is some communication cost paid here.

2096.872 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's the same communication cost paid on the output.

2099.936 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then the hope is that this does not become communication limited.

2103.56 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now, what is the traffic pattern here?

2110.132 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The traffic pattern here is that any GPU, in fact, will be talking to any other GPU, depending on the decisions made by the model.

2112.822 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is an all-to-all traffic pattern.

2121.353 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

2128.903 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment