Dwarkesh Patel

👤 Speaker

15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, as...

6461.548 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

the length of the prefix goes up or pass your memory bandwidth time declines and that means that to the extent that you were bottlenecked on memory bandwidth before you can avoid being bottlenecked on memory bandwidth the fact that they are charging 5x less for

6466.7 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

pre-fill then decode does suggest that they are bottlenecked on memory bandwidth to quite a degree such that for them at least, because T is equivalent to cost, right?

6491.54 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's the cost of rendering a compute.

6500.929 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is actually like, this would be at one and this would be at five.

6503.311 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That's right.

6508.916 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

6509.197 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So it is in fact tremendously memory bandwidth bottlenecked.

6509.617 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The real graph looks something like, the real graph looks something like, like that.

6513.381 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, yeah, let me do it this way.

6522.292 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, that's right.

6525.335 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then this is the gap on decode between the memory and the compute time.

6531.302 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, yeah.

6539.952 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Interesting.

6540.512 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Another interesting one would be why cache hits are so much cheaper.

6541.754 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I think, if I remember correctly, cache hits are like 10x.

6546.828 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's more expensive to write to cache according to the pricing on all these models.

6550.874 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But if you do hit a cache, it's 10x cheaper.

6555.982 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So what is going on with... Presumably, this is the cost of keeping something in HBM rather than just evacuating it.

6560.028 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But if you do keep it in HBM, then it's cheaper to load again?

6572.405 View full episode →

← Previous Page 38 of 764 Next →

Report any issue