Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we'll just fill this in.

4106.831 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We'll have the.

4107.772 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Nice.

4114.291 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, so we've got the two and three.

4116.134 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's split this batch.

4120.762 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This batch will be the global batch size.

4127.675 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So B is going to be the number of micro batches.

4129.878 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

times the batch size per micro-batch.

4137.95 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So how many micro-batches do we need?

4142.32 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the number of micro-batches in this diagram is four, zero, one, two, three.

4145.428 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then the micro-batch size

4149.738 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is still this, like, 2000-ish number.

4155.01 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is the one that is, like... This is the, like, 2000 times sparsity.

4158.395 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Sorry, no, this is the 300 times sparsity.

4165.506 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

300 times sparsity.

4168.41 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Right, yes.

4174.433 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is going to be the 20 milliseconds train.

4175.374 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the global batch size is the number of micro-batches times the local batch size.

4179.279 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Local batch size is set by this hardware parameter.

4183.203 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The number of micro-batches, well, the number of micro-batches is as small as possible such that we can wrap around and not leave any idle time when we wrap around.

4185.265 View full episode →

← Previous Page 31 of 58 Next →

Report any issue