Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

keep write everything you want to it or take everything out of it.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Or we don't want to be in a situation where our ability to write back and forth is so big, or sorry, so small compared.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, makes sense.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Makes a ton of sense.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so a couple of actually quick questions.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

One, if it is the case that the optimal batch size is something like 2000, and that actually true, it's totally dependent on sparsity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's not dependent on the model size or anything.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But that's a very interesting result.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And that seems to imply that you can...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

One question is, how much of a push towards centralization is it that you would have these economies of scale from inference, from batching?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But it seems like it's not that big a deal.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like, I don't know, is 2,000 users at the same time a lot?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It doesn't seem like a lot?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But I mean, Gemini is big.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That's actually one thousandth of Gemini is a lot.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

To actually be like...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

To be competitive at scale, you need to be able to serve at least 1,000 Go Gemini?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That's interesting.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Cool.