Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Thank you.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So what we would like is for the scale-up time to be greater than the scale-out time, because the scale-up time is the more important and precious resource.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so we would like this number to be greater than or equal to 1.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And this really doesn't seem hard.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There's just a factor of 8 that we need to overcome.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we need the product of these three things to be bigger than 8.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Typically, we have a fairly large number of activated experts.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It could be eight by itself.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then we can increase the number of layers per stage a lot until we satisfy this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I see.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So what this ends up looking like is that I can, in fact, have an entire pipeline of racks where one rack does one layer, and then I move on to the next rack, and I do another layer, and then I move on to the next rack.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I can do another layer.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Isn't that, I feel that's interesting that the physical and... The model architecture matches, like the cutting matches the model architecture.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, exactly.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, I mean, I think a way to think of it is, I mean, okay, the galaxy brain way to think of it is...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like, what are all the different dimensions in which a model is scaled up?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so it is scaled up by layers, it is scaled up by the demodeled dimension, it is scaled up by the DFF dimension, it is scaled up by the number of experts.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Every single one of those numbers you can choose to cut along.