Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we can add in the inference cost.

5008.45 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The inference cost is 2, number of active times the data in inference.

5009.932 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Thank you.

5037.272 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then inference is just two.

5037.812 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

5039.515 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we're going to solve for essentially maybe a quality of all three of these terms.

5040.016 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That is...

5044.303 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

ballpark where people are going to be like labs have more information on what is productive in doing more RL for example than versus doing more pre-training I don't have that information but I think a good ballpark is 30-30 like 33% split between each of them actually I'm not sure I understand the intuition for that another naive model could have been that RL plus pre-training would be 50%

5044.964 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, that's also a valid answer as well.

5069.977 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Because this is heuristic, I can't really argue for one versus the other.

5073.868 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

They don't differ by that much.

5076.936 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like 33 versus 25 is only a small factor off.

5078.099 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's pick one of them.

5084.283 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

All equals seems simple enough.

5087.809 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so we're just going to solve for equality of them.

5090.192 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's pretty straightforward.

5093.778 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We can immediately see that the number of activated parameters totally disappears.

5094.679 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so let's factor that out.

5097.624 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And we're going to just say that data in pre-training

5099.026 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I decided to do it your way.

5104.651 View full episode →

← Previous Page 38 of 58 Next →

Report any issue