Reiner Pope

If you look at the cost per token, or the number of flops per token, there is the flops that are coming from doing the weight matrix multiplies as a function of context lengths.

6716.213 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then there is the number of multiplies that comes from doing the kvcache, which goes up linearly with the amount of stuff you attend to.

6728.312 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The slope on this is so low that when you draw it like this, it's very well approximated by a flat line.

6736.846 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So you start to notice the effect of the quadratic or the linear term up in the millions of tokens or so.

6742.555 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So just not super relevant.

6750.05 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, so there are two costs of long context.

6759.65 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

One is the memory bandwidth cost, which we've spent a lot of time analyzing.

6761.454 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That's this thing.

6764.359 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then the other one is the compute cost.

6765.161 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The compute cost is almost always, and sort of actually forced by...

6768.568 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

fundamental principles to be a much smaller slope than the memory bandwidth cost.

6775.057 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so the primary thing that limits you to have really large contexts are memory bandwidth and memory capacity, which is exactly this effect.

6780.647 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

6825.618 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment