Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And we would prefer that we get like learning on the job.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

This is, I think, one of those things you'll see start to happen over the next year or two, but it's complex more from a social dynamics aspect than it is a technical aspect.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And if you created the Dworkesh podcast RL feedback loop, then the models would get incredible at whatever you wanted them to do, I suspect.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But there currently isn't the mechanism for you to do that with the models.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

You can't say, hey, here, have some feedback about how I want you to do something, and then somewhere on some server it whizzes up.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Currently, there's a text-based memory where it goes and forwards things about what you wanted.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And it puts it in the prompt.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It tries to build its own scaffolding and context.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I think an interesting question over the next few years is whether that is totally sufficient, whether you just like this raw base intelligence plus sufficient scaffolding in text is enough to build context, or whether you need to somehow update the weights for your use case, or some combination thereof.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But so far, we've only explored the first.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It's a good question.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

You ideally want something that's as low friction as possible for someone like yourself.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

You're having a conversation, and you say, no, not like that.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

You want some alert to flip and be like, hey, OK, we can convert this into something we could learn from.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

That's complex and tricky, and there's a lot of subtleties in how to do that.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I mean, like, the opening-eye sycophancy stuff is, like, one example of this where you'd think, like, thumbs up and thumbs down are a good indication of, like, what is good in a response.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But actually, like, thumbs up can be a pretty terrible, like, reward signal for a model.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And in the same way, like, when Claude is doing coding for me, I'll actually often, like...

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Sometimes I'm there just accepting its suggestions, but sometimes it actually does pretty much the right thing.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And I'm just like, oh, it's like 90% of the way there, but not perfect.