Will Douglas Heaven
π€ SpeakerAppearances Over Time
Podcast Appearances
But they saw that one part of the model was talking when you asked it, is a banana yellow?
And another part was talking when you asked it what color a banana is.
I spent a long time talking about what color a banana is.
But the point of this tiny example is that that's so unlike what you might expect it to do.
And so
When we are using these machines and when we're studying these machines, we need to be careful not to just fall into the trap of thinking that they are like people and that they will always behave and answer questions in the same way that people do.
Because even that trivial example shows that
they don't.
They behave in a more weird way than you might expect.
And in certain cases, that's going to really, really matter.
One of the scientists I was talking to who was watching that problem said something like, while we're testing, while we're
while we're testing our model and we're asking it what it would do in certain situations, all the answers it gives us, if you ask it, basically, if this happened, can you promise me you won't be evil?
It might say yes, but maybe if you'd ask the question in a slightly different way, a different part of its brain would be working and it would say, hmm, yeah, not so sure.
So this, yeah, I'm going to sort of take this one carefully because there is a very, very polarized debate about to what extent these models are dangerous, to what extent they may sort of, you know,
Some people genuinely think some kind of rising up, robots rising up and taking over.
Yeah, that is no longer science fiction for some people.
And there are people who are very worried about the trajectory we're on.
Yeah, that's terrifying.
That sentence right there is something that I didn't think we'd hear.
But yes.