Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then we can add in the inference cost.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The inference cost is 2, number of active times the data in inference.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Thank you.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then inference is just two.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we're going to solve for essentially maybe a quality of all three of these terms.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That is...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

ballpark where people are going to be like labs have more information on what is productive in doing more RL for example than versus doing more pre-training I don't have that information but I think a good ballpark is 30-30 like 33% split between each of them actually I'm not sure I understand the intuition for that another naive model could have been that RL plus pre-training would be 50%

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, that's also a valid answer as well.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Because this is heuristic, I can't really argue for one versus the other.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

They don't differ by that much.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like 33 versus 25 is only a small factor off.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's pick one of them.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

All equals seems simple enough.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so we're just going to solve for equality of them.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's pretty straightforward.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We can immediately see that the number of activated parameters totally disappears.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so let's factor that out.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And we're going to just say that data in pre-training

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I decided to do it your way.