Yoshua Bengio
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
So there are many such experiments.
This is just one, and it's been done in many companies, including outside the labs by independent organizations.
So there's a real phenomenon.
I think it needs more study, and there are critics of the methodologies, but there's too much pieces of evidence to just ignore.
one interesting aspect of these experiments is when you ask the ai why they did that they lie they pretend oh i don't know it's not me or something trying to put the blame on someone else it's great moral character here um and basically they're deceptive there's also a variant similar to what you talked about
where the only option really that the AI asks to not die is to kill the CTO actually, lead engineer.
The person happens to be stuck in a room and the AI can control the climate controls for the room and they can basically cook that person.
It does, but I think it's a bit broader than that.
So first, what's called the pre-training phase, where most of the training takes place, is imitating what humans write based on what they have already written.
And it means imitating human behavior, because our words are our actions.
Of course, humans don't want to die.
Humans are willing to lie to protect themselves.
They're willing to deceive and all these things.
And a lot of literature is about all these bad things happening.
And then the other aspect is the reinforcement learning, where they learn to strategize and to achieve goals.
And to achieve goals, often you need to go through steps, sub-goals.