Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

which is linear in batch size.

563.032 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So it looks like that.

565.376 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the sum of this plus this maxed with this.

566.619 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's at least first draw the sum.

570.907 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the two memory times in conjunction end up looking on this curved slope like this.

578.562 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we get the overall maximum is, I'll draw a little thicker here, is the maximum of these two curves.

584.53 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Make sense?

592.82 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, so what does this mean actually?

594.502 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is a latency plot.

596.865 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So if I grow my batch size, I get initially some not very strong dependence on batch size.

603.092 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so there's some lower bound on latency here, latency lower bound.

609.56 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Lower bound.

615.496 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this already partially answers the question.

620.103 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

For a given hardware configuration, and we can talk about varying hardware configuration, but for a given hardware configuration, there is a lower bound on latency, which is simply the, I need to read all of my total parameters from memory into the chips.

622.686 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And that takes a certain amount of time.

639.162 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If I use all of my memory bandwidth, I can't do any better than that.

640.343 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, this is really sensitive to the context length.

671.875 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I think we should come back and explore this.

674.498 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There will be, as you vary the context length, the KB fetch time will go up and up.

677.082 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so that'll cause a transition from compute limited to memory limited.

680.887 View full episode →

← Previous Page 5 of 58 Next →

Report any issue