Sholto Douglas

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so maybe the way I would define it now is the thing that's holding them back is if you can give it a good feedback loop for the thing that you want it to do, then it's pretty good at it.

221.498 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

If you can't, then they struggle a bit.

233.355 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yes.

241.987 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So the big thing that really worked over the last year is –

243.008 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Maybe broadly, the domain is called RL from verifiable rewards or something like this, where a clean reward signal.

247.098 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So the initial unhoppling of language models was RL from human feedback, where typically it was something like pairwise feedback or something like this, and the outputs of the models became closer and closer to things that humans wanted.

253.69 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But this doesn't necessarily improve their performance at any level.

265.853 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

like difficulty of problem domain, right?

269.66 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Particularly as humans are actually quite bad judges of what a better answer is.