Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Maybe none at all, maybe two, just enough to make the weight storage not too big of an issue.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Those are the only two parallelisms that really make sense.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In the past, there was tensor parallelism, which was cutting up within an expert, but the experts are so small now that that is not a profitable optimization.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yes.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, I mean, you can look at how it depends on model size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like, you could have a very large model, like one that exceeds the memory of a rack, and there you should be doing a bit of pipelining.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Maybe it's extremely sparse, for example, and that would be a reason to do it.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Actually, so pipelining doesn't help with context length.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It totally helps with model size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so because of the ability to do pipelining, at least a rack should not be a constraint on your ability to fit the model parameters.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I guess the other consideration you're asking like, why hasn't it scaled up more and why did bigger scale-up domains help?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we talked through one aspect of that, which is, we kind of said it's not because of memory capacity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We have a solution to the memory capacity, at least with respect to model size, not with respect to KV cache size, but at least with respect to model size, we have a solution to memory capacity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The other issue that shows up is latency.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is very much dependent on the hardware.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I can't say with a lot of authority.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I think it's probably on the order of a few milliseconds, but it could be off by an order of a few seconds.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so that's not that much.