Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Joshua wants to put scaffolding around the prediction model, asking it different questions at each stage to effectively assemble it into a capable agent while keeping it just as honest as it was before.
we'll then hopefully be able to have our cake and eat it too, getting the highly capable agents that businesses are craving and demanding and insisting on, while still being confident that those agents are being completely direct with us.
Joshua thinks that these agents perhaps might even be more capable as well, thanks to a superior reasoning process, or at the very least, a clearer and more explainable one.
It's fair to say that this proposal is huge if true, or at least huge if it will work.
And of course, not everyone is sold on that idea, as Joshua and I will discuss later.
Okay, that's the shape of things to come.
The technical discussion continues for a while, but if you decide you want to skip that, the second half of the conversation stands very well on its own, starting with the chapter, how much would this cost?
All right, on with the show.
And how would you train a model like that?
Yeah, so what are all of the ways that you think that the models that we're currently racing to build now are unsafe?
And why would this, you call this scientist AI, why would that kind of model be different and better?
I'm a little bit surprised that you're foregrounding the potential for it to come up with kind of implicit goals during the pre-training, the predict the next word stage, where it learns to mimic humans, because we're investing an enormous amount of effort in making them extremely proactive agents with very explicit goals.
That seems to me like where I'd be most worried about things going awry.
I'm worried about both.
Yeah, what sort of training dataset would you need to make and then how would you turn that into a model?
So maybe you can explain if I've got the right picture of how this would work.
You put a huge data set of all of the things that people have said and where they said it and who was speaking and when.
And then you build, I guess, in the same database, you've also got a set of things that establish as true, like statements that you're just going to say, this is the ground truth that we're going to try to predict.
And then you try to use the speech acts, the things that were said, to predict the things that you are claiming are true.
And so it builds a world model internally where you can feed in statements and it will give you a probability that that thing is true in the world model that it has.