Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

πŸ‘€ Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I have to do a certain amount of multiplies in order to, of GPU time that I spend in order to produce it.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Storing HBM.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

This really goes as my, I think I had a number here, which was the bytes per token.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So I need to just have some number of bytes per token, and then I need to store this in the HPM.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So it's gonna use up some of my HPM capacity.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So a way to think of this is that if I have too many of these things sitting in my HBM, if I fill up my HBM with just KV caches that I'm not using, I can't use that GPU.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And so how do I price that?

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Maybe I say that the cost of it is proportional to the fraction of the HBM I'm using.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So there's also times GPU dollars.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then let's just do one more memory tier and say something like DDR, store in DDR instead.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

The same kind of thing goes up for Flash and for DDR.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I put these in the wrong columns, actually.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I meant to make two columns.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

The distinction I want to make is that there is the cost to retrieve

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then there's a cost to store, cost to hold on.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And so this is like, there's a cost per second, whereas this is like an instantaneous cost.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So rematerialization has a cost to retrieve and has zero cost to store it because we've deleted it.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

This is the one that I put in the wrong location.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

This is actually the cost just to hold on.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So I will rewrite it.