Sholto Douglas

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

When you look at Mita's evals of can the model solve the task, they're there solving them for like hours.

5791.314 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

over multiple iterations.

5797.302 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And eventually one of them is like, oh yeah, I've come back and I've solved the task.

5799.627 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Me, at the moment, at least maybe the fault is my own, but I try the model on doing something and if it can't do it, I'm like, okay, fine, I'll do it.

5802.553 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But this more async form factor, I expect to, like, really quite dramatically improve the experience of these models.

5848.399 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Interesting.

5853.546 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Or you can just say, like, let's see if it can do that.

5853.806 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

The intellectual ceiling is really high.

5900.219 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I think one important point is that

5948.473 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

When you look at AlphaZero, it does have all of those ingredients.

5952.89 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And in particular, I think the intellectual ceiling goes quite contra what I was saying before, which is we've demonstrated this incredible complexity of math and programming problems.

5955.276 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I do think that the type of task and setting that AlphaZero worked in, this two-player perfect information game, basically, is incredibly friendly to IRL algorithms.

5965.546 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And the reason it took so long

5980.069 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

to get to more proto-AGI style models is you do need to crack that general conceptual understanding of the world and language and this kind of stuff.

5983.855 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And you need to get the initial reward signal on tasks that you care about in the real world, which are harder to specify than games.

5993.344 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think then that that sort of gradient signal that comes from the real world, all of a sudden you get access to it and you can start climbing it.

6001.692 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Whereas AlphaZero didn't ever have the first rung to pull on.

6010.18 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

MARK MANDELMANN- I would be extremely surprised if that was the case.

6043.186 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think that would be somewhat of an update towards there's something strangely difficult about this, like computer use in particular.