Sholto Douglas

👤 Speaker

1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yes, I mean, I agree.

7810.054 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Like, the case where you end up with, like, two national projects facing off against each other is dramatically worse.

7811.717 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Right.

7817.124 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Like, we don't want to live in that world.

7818.165 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Much better if there's, like,

7820.828 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It stays a free market, so to speak.

7822.15 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, yeah, yeah.

7825.094 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, like a continuous distribution of this stuff.

7854.764 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

One important mental model to think about RL is I think as the task gets more complex,

7858.672 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

There is some respect with which longer horizon or better at that task, if you can do them, if you can get that reward ever, are easier to judge.

7866.7 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So again, let's come back to that, can you make money on the internet?

7875.875 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

That's an incredibly easy reward signal to judge.

7878.86 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But to do that, there's a whole hierarchy of complex behavior.

7882.225 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So if you could pre-train up to the easy to judge reward signals, does your website work?

7885.711 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Does it go down?

7889.778 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Do people like it?

7890.48 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

There's all these reward signals that we can respond to because we can progress through these long enough trajectories to actually get to interesting things.

7891.121 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

If you're stuck in this regime where

7900.758 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

you need to reward signal every five tokens like it's way more painful and like long process but if you could like pre-train on every like screen in america um then probably the like rl tasks that you can design are very different to like if you could only like take the existing internet as it is today um and so like how much of that you get access to like changes the mix interesting

7902.982 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, that's definitely one of the big complexities, right?

7961.926 View full episode →

← Previous Page 30 of 79 Next →

Report any issue