Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I'll just make that claim and move on.

4908.6 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we're going to say that the cost of training

4912.085 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

plus the cost of inference.

4916.933 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We want to equalize these.

4918.997 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We'll do pre-training only first because it's a little... Well, actually, we can do all of it in general.

4923.625 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So actually, we'll customize.

4927.451 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Cost of pre-training.

4929.535 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So number of active params times the data on pre-training.

4930.797 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So that's the cost of pre-training.

4941.453 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's a factor of six out here, which is the number of flops.

4942.754 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is the famous 6ND formula.

4946.258 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then in RL, we have approximately the same thing.

4948.641 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We've got like same number of active parameters, but now it's the amount of data is the RL data.

4953.446 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's this extra like efficiency multiplier, which is, or inefficiency, like the inefficiency multiplier.

4960.915 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Well, yeah, there's that.

4975.178 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then the other, perhaps even bigger inefficiency is that this involves a substantial amount of decode and often decode runs at less MFU than training.

4977.421 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like, this could be somewhere, so... It would at least be 2.

4996.894 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, somewhere in the range of 2 to 6.

5000.319 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we'll just, like, we'll say somewhere in the range of 2 to 6 and leave it at that.

5002.201 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

5008.069 View full episode →

← Previous Page 37 of 58 Next →

Report any issue