Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This becomes the dominant term.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The KB cache becomes the dominant term.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, you only need to store like one layer rather than two layers of KVs, right?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So it helps from that perspective.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

What's competing with that, though, is that you need to be keeping all of the racks usefully busy at a time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so the number of sequences that are in flight simultaneously has gone up.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, yeah, yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Makes sense, makes sense, makes sense.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So those exactly cancel, and you end up not getting a saving per GP.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Well, so first we did you can't amortize KV caches across batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And now we're saying you also can't shard it across pipeline stages.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It sucks from both of those points of view.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, yeah, yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Interesting.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so then what is that during inference?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, I mean, the DeepSeq paper reports what they do, which is they just do a lot of expert parallelism.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

You should...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In effect, you should increase your expert parallelism up to your scale-up domain size, and then do very little pipelining.