Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

πŸ‘€ Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Okay.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So we have, this is the, like, if we're just storing it in HBM, it has this sort of cost profile.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then if we store in DDR, it's actually going to take some time.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So it's like, we get the same thing here, but it's,

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

writes per token over DDR capacity times DDR cost per second.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

But now this has a cost to retrieve that is higher than the HPM because we need to copy it into the HPR.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And so this is writes per token

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

over DDR bandwidth.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then this consumes some amount of the DDR as well.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And every scale-up has DDR and Flash?

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

There's really a deployment question, and so you can choose that.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

NVIDIA does deploy in this form.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It has both.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah, I mean, it depends on what you define a retrieve to be.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Here I'm defining retrieve to be move it into HBM so that you can start actually doing inference on it.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Sort of by definition.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So these are three things, and I guess I ordered them wrong.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

In general, if you're balancing two costs and you've got different tiers in the memory hierarchy, you should expect as this cost goes up, this cost should go down.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So you can kind of see where the zeros are, and I should have ordered them this one first, this one second, and this one third.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So...