Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

πŸ‘€ Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I mean, sparse attention gives you a get out for sure because you get this square root, like it gives you a big improvement.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

But I think it's like, if you look at the history of context lengths of models,

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

From earlier models like GPT-3, maybe to GPT-4, I don't remember when the transition happened exactly.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

They shot up from about 8K to 100K, 200K.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then for the last year or two, they've all been hovering around there.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I think that actually indicates that that's sort of the reasonably balanced cost point.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And going massively beyond that would be cost prohibitive.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Because of the memory bandwidth cost, yeah.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So I actually don't see a very good path to solving that.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

The HBM is at where it is.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It's not getting hugely better.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Sparse attention is a big improvement.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Maybe that is priced in already, perhaps.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It's not an infinite improvement because if you go too sparse, you lose too much quality.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

But yeah, I mean, the empirical result is that the context things haven't been increasing that much.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And I think it's because there is no solution to the memory wall here.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Interesting.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So going too sparse just means you're attending to a very small subset of the tokens and the quality will get worse.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So what is the cost of these different ways of producing, resynthesizing the KV cache?

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Computing it from scratch is based on my GPU time.