Dwarkesh Patel

That's why RL where you might... You're definitely going to generate all the trajectories, but you might or might not train all the trajectories is two to six.

5029.499 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yes.

5035.849 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

5036.31 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And inference would be 50%.

5068.994 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If both of them are 1 in 10, that kind of implies that there's never a backward pass on RL?

5181.119 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is like 1.5 and this is one, um, um, Billions of dollars of the compute just flowed the other direction.

5252.759 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Right.

5260.987 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But then, so it looks... Sorry, I'm making a basic algebra mistake.

5295.136 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It seems like there should be less RL tokens than pre-training tokens?

5299.081 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is quite interesting.

5317.706 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment