Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
When you look at Mita's evals of can the model solve the task, they're there solving them for like hours.
over multiple iterations.
And eventually one of them is like, oh yeah, I've come back and I've solved the task.
Me, at the moment, at least maybe the fault is my own, but I try the model on doing something and if it can't do it, I'm like, okay, fine, I'll do it.
But this more async form factor, I expect to, like, really quite dramatically improve the experience of these models.
Interesting.
Or you can just say, like, let's see if it can do that.
The intellectual ceiling is really high.
I think one important point is that
When you look at AlphaZero, it does have all of those ingredients.
And in particular, I think the intellectual ceiling goes quite contra what I was saying before, which is we've demonstrated this incredible complexity of math and programming problems.
I do think that the type of task and setting that AlphaZero worked in, this two-player perfect information game, basically, is incredibly friendly to IRL algorithms.
And the reason it took so long
to get to more proto-AGI style models is you do need to crack that general conceptual understanding of the world and language and this kind of stuff.
And you need to get the initial reward signal on tasks that you care about in the real world, which are harder to specify than games.
And I think then that that sort of gradient signal that comes from the real world, all of a sudden you get access to it and you can start climbing it.
Whereas AlphaZero didn't ever have the first rung to pull on.
MARK MANDELMANN- I would be extremely surprised if that was the case.
And I think that would be somewhat of an update towards there's something strangely difficult about this, like computer use in particular.
I don't know if it's the bust timeline, but it's definitely the I would update on this being a lengthening of timelines.