Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Sholto Douglas

👤 Person
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And that's, like, the complexity in some respects.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

In, like, the alpha variants, or maybe you were about to say this, like, one player always wins.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So you always get a reward signal one way or the other.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But in the kinds of things we're talking about, you need to actually succeed at your task sometimes.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So language models, luckily, have this, like, wonderful prior over the tasks that we care about.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So if you look at all the old papers from 2017, it's not that old, but the papers from 2017, the learning curves always look like flat, flat, flat, flat, flat as they're figuring out basic mechanics of the world.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then there's this spike up as they learn to exploit easy rewards.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then it's almost like a sigmoid in some respects.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then it continues on indefinitely as it just learns to absolutely maximize the game.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think the LLM curves look a bit different in that there isn't that dead zone at the beginning.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Because they already know how to solve some of the basic tasks.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so you get this initial spike.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And that's what people are talking about when they're like, oh, you can learn from one example.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

That one example is just teaching you to pull out the backtracking and formatting your answer correctly and this kind of stuff that lets you get some reward initially at tasks, conditional on your pre-training knowledge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then the rest probably is you learning more and more complex stuff.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, it's like off the curve.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Totally, yeah.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's like a tug in the grain.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Only if you get feedback.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah.