Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And we call that the KB cache.

434.775 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this process of attending, this single token attending to all of the history of tokens, that's attention.

437.195 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It is mostly dominated by memory fetches rather than matrix multiplies.

443.084 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we've got the amount of memory that we're fetching shown over here.

448.773 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then there's, of course, just then divided by the memory bandwidth.

452.899 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the memory bytes per second.

456.946 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So in fact, these equations here are actually enough for us to now draw some fit lines.

464.566 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so the things that we'd like to look at are sensitivity to batch, and then also, which we'll draw separately to context links.

470.522 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we said that the big effects you can get is like some trade-off in latency versus cost in batch size.

481.103 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's draw them out.

488.431 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I think there's just really two graphs we want to draw.

489.813 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We'll first just draw batch size versus time here.

492.416 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So when we look at the shape of this, we've got a maximum of the sum and then another term.

501.146 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's look at these terms one by one and how they scale the time for compute and memory and how they show up.

508.643 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's first look at this compute time.

516.415 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is just purely linear in batch size with no offset.

520.601 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So it is some curve like this.

524.527 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is T compute.

527.372 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then on the memory side, we've got some portion here that is just this constant that is constant in some base offset here, which is the weight fetch.

533.18 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then finally, we have this term here, which is the kbfetch, which we're going to draw as the kbfetch.

549.637 View full episode →

← Previous Page 4 of 58 Next →

Report any issue