Sholto Douglas

So if you look at all the old papers from 2017, it's not that old, but the papers from 2017, the learning curves always look like flat, flat, flat, flat, flat as they're figuring out basic mechanics of the world.

860.082 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then there's this spike up as they learn to exploit easy rewards.

875.225 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then it's almost like a sigmoid in some respects.

879.151 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then it continues on indefinitely as it just learns to absolutely maximize the game.

882.556 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think the LLM curves look a bit different in that there isn't that dead zone at the beginning.

887.183 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Because they already know how to solve some of the basic tasks.

892.368 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so you get this initial spike.

895.771 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And that's what people are talking about when they're like, oh, you can learn from one example.

898.793 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

That one example is just teaching you to pull out the backtracking and formatting your answer correctly and this kind of stuff that lets you get some reward initially at tasks, conditional on your pre-training knowledge.

902.016 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then the rest probably is you learning more and more complex stuff.

913.547 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, it's like off the curve.

929.215 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Totally, yeah.