Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

πŸ‘€ Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I have to rerun the compute.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

at whatever speed my GPU does it, and then I multiply it by my GPU dollars per second.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah, so there is a quadratic term.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It shows up in the compute.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

As an approximation, I chose to remove it.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I'll just show you sort of quickly what that looks like.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It's because... So you have the...

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

If you look at the cost per token, or the number of flops per token, there is the flops that are coming from doing the weight matrix multiplies as a function of context lengths.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then there is the number of multiplies that comes from doing the kvcache, which goes up linearly with the amount of stuff you attend to.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

The slope on this is so low that when you draw it like this, it's very well approximated by a flat line.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So you start to notice the effect of the quadratic or the linear term up in the millions of tokens or so.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So just not super relevant.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah, so there are two costs of long context.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

One is the memory bandwidth cost, which we've spent a lot of time analyzing.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

That's this thing.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then the other one is the compute cost.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

The compute cost is almost always, and sort of actually forced by...

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

fundamental principles to be a much smaller slope than the memory bandwidth cost.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And so the primary thing that limits you to have really large contexts are memory bandwidth and memory capacity, which is exactly this effect.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah.