Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay.

7041.85 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we have, this is the, like, if we're just storing it in HBM, it has this sort of cost profile.

7055.045 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then if we store in DDR, it's actually going to take some time.

7060.952 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So it's like, we get the same thing here, but it's,

7066.638 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

writes per token over DDR capacity times DDR cost per second.

7071.23 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But now this has a cost to retrieve that is higher than the HPM because we need to copy it into the HPR.

7083.855 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so this is writes per token

7089.887 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

over DDR bandwidth.

7095.378 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then this consumes some amount of the DDR as well.

7100.995 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And every scale-up has DDR and Flash?

7104.425 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's really a deployment question, and so you can choose that.

7107.933 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

NVIDIA does deploy in this form.

7110.677 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It has both.

7112.579 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, I mean, it depends on what you define a retrieve to be.

7119.248 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Here I'm defining retrieve to be move it into HBM so that you can start actually doing inference on it.

7122.332 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Sort of by definition.

7128.06 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So these are three things, and I guess I ordered them wrong.

7136.131 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

In general, if you're balancing two costs and you've got different tiers in the memory hierarchy, you should expect as this cost goes up, this cost should go down.

7138.775 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So you can kind of see where the zeros are, and I should have ordered them this one first, this one second, and this one third.

7150.673 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So...

7161.009 View full episode →

← Previous Page 52 of 58 Next →

Report any issue