Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we'll just fill this in.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We'll have the.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Nice.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, so we've got the two and three.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's split this batch.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This batch will be the global batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So B is going to be the number of micro batches.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

times the batch size per micro-batch.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So how many micro-batches do we need?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So the number of micro-batches in this diagram is four, zero, one, two, three.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then the micro-batch size

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is still this, like, 2000-ish number.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is the one that is, like... This is the, like, 2000 times sparsity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Sorry, no, this is the 300 times sparsity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

300 times sparsity.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Right, yes.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is going to be the 20 milliseconds train.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So the global batch size is the number of micro-batches times the local batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Local batch size is set by this hardware parameter.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The number of micro-batches, well, the number of micro-batches is as small as possible such that we can wrap around and not leave any idle time when we wrap around.