Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

When you look at Mita's evals of can the model solve the task, they're there solving them for like hours.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

over multiple iterations.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And eventually one of them is like, oh yeah, I've come back and I've solved the task.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Me, at the moment, at least maybe the fault is my own, but I try the model on doing something and if it can't do it, I'm like, okay, fine, I'll do it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But this more async form factor, I expect to, like, really quite dramatically improve the experience of these models.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Interesting.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Or you can just say, like, let's see if it can do that.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

The intellectual ceiling is really high.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I think one important point is that

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

When you look at AlphaZero, it does have all of those ingredients.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And in particular, I think the intellectual ceiling goes quite contra what I was saying before, which is we've demonstrated this incredible complexity of math and programming problems.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I do think that the type of task and setting that AlphaZero worked in, this two-player perfect information game, basically, is incredibly friendly to IRL algorithms.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And the reason it took so long

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

to get to more proto-AGI style models is you do need to crack that general conceptual understanding of the world and language and this kind of stuff.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And you need to get the initial reward signal on tasks that you care about in the real world, which are harder to specify than games.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And I think then that that sort of gradient signal that comes from the real world, all of a sudden you get access to it and you can start climbing it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Whereas AlphaZero didn't ever have the first rung to pull on.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

MARK MANDELMANN- I would be extremely surprised if that was the case.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And I think that would be somewhat of an update towards there's something strangely difficult about this, like computer use in particular.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I don't know if it's the bust timeline, but it's definitely the I would update on this being a lengthening of timelines.