Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
So we'll just fill this in.
We'll have the.
Nice.
Yeah, so we've got the two and three.
So let's split this batch.
This batch will be the global batch size.
So B is going to be the number of micro batches.
times the batch size per micro-batch.
So how many micro-batches do we need?
So the number of micro-batches in this diagram is four, zero, one, two, three.
And then the micro-batch size
This is still this, like, 2000-ish number.
This is the one that is, like... This is the, like, 2000 times sparsity.
Sorry, no, this is the 300 times sparsity.
300 times sparsity.
Right, yes.
This is going to be the 20 milliseconds train.
So the global batch size is the number of micro-batches times the local batch size.
Local batch size is set by this hardware parameter.
The number of micro-batches, well, the number of micro-batches is as small as possible such that we can wrap around and not leave any idle time when we wrap around.