Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Yes, I mean, I agree.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Like, the case where you end up with, like, two national projects facing off against each other is dramatically worse.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Right.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Like, we don't want to live in that world.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Much better if there's, like,

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It stays a free market, so to speak.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Yeah, yeah, yeah.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I mean, like a continuous distribution of this stuff.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

One important mental model to think about RL is I think as the task gets more complex,

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

There is some respect with which longer horizon or better at that task, if you can do them, if you can get that reward ever, are easier to judge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So again, let's come back to that, can you make money on the internet?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

That's an incredibly easy reward signal to judge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But to do that, there's a whole hierarchy of complex behavior.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So if you could pre-train up to the easy to judge reward signals, does your website work?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Does it go down?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Do people like it?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

There's all these reward signals that we can respond to because we can progress through these long enough trajectories to actually get to interesting things.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

If you're stuck in this regime where

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

you need to reward signal every five tokens like it's way more painful and like long process but if you could like pre-train on every like screen in america um then probably the like rl tasks that you can design are very different to like if you could only like take the existing internet as it is today um and so like how much of that you get access to like changes the mix interesting

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I mean, that's definitely one of the big complexities, right?