Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

On a small number of pipelining stages, this is not a huge latency impact.

4541.39 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, if it goes from 20 to 30, right?

4556.049 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Or something like that, yeah.

4557.531 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So just to chart the path that it goes through, here you're going from your GPU or TPU or whatever to a network card, which then goes to a top of rack switch.

4559.655 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

and then hops over to the other rack and does the same thing in reverse.

4578.32 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So you sort of have to sum up the latencies of these different things.

4581.623 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is the same thing as the DC switch?

4584.926 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It may, in fact, go up to a DC switch and back.

4587.549 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Depends on deployment configuration.

4591.032 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, so, I mean, we talked about latency of the hop, of this hop.

4617.355 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There is also just the same T-mem latency, the memory time latency, is actually substantially, like, massively improved by larger scale-up domains.

4622.982 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So...

4633.135 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I'll recall tmem down here.

4635.273 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

tmem for the weights, tmem of weights.

4638.96 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This was equal to the number of total parameters divided by the memory bandwidth.

4648.138 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

which memory bandwidth are we talking about here?

4658.358 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Is it just one GPU or it's in fact, it is the number of GPUs that I can use in parallel to load these weights.

4659.999 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I can't use different pipeline stages in parallel because they're not running at the same time, but I can use all the GPUs in my scale up domain in parallel to load the weights.

4668.347 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so this is actually extremely effective.

4678.116 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So basically I end up with a term here, this memory bandwidth term itself,

4682.78 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment