Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
And we would prefer that we get like learning on the job.
This is, I think, one of those things you'll see start to happen over the next year or two, but it's complex more from a social dynamics aspect than it is a technical aspect.
And if you created the Dworkesh podcast RL feedback loop, then the models would get incredible at whatever you wanted them to do, I suspect.
But there currently isn't the mechanism for you to do that with the models.
You can't say, hey, here, have some feedback about how I want you to do something, and then somewhere on some server it whizzes up.
Currently, there's a text-based memory where it goes and forwards things about what you wanted.
And it puts it in the prompt.
It tries to build its own scaffolding and context.
I think an interesting question over the next few years is whether that is totally sufficient, whether you just like this raw base intelligence plus sufficient scaffolding in text is enough to build context, or whether you need to somehow update the weights for your use case, or some combination thereof.
But so far, we've only explored the first.
It's a good question.
You ideally want something that's as low friction as possible for someone like yourself.
You're having a conversation, and you say, no, not like that.
You want some alert to flip and be like, hey, OK, we can convert this into something we could learn from.
That's complex and tricky, and there's a lot of subtleties in how to do that.
I mean, like, the opening-eye sycophancy stuff is, like, one example of this where you'd think, like, thumbs up and thumbs down are a good indication of, like, what is good in a response.
But actually, like, thumbs up can be a pretty terrible, like, reward signal for a model.
And in the same way, like, when Claude is doing coding for me, I'll actually often, like...
Sometimes I'm there just accepting its suggestions, but sometimes it actually does pretty much the right thing.
And I'm just like, oh, it's like 90% of the way there, but not perfect.