Yoshua Bengio
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
How do we avoid bad intentions?
We can avoid all intentions.
So we can build a machine that is like the laws of physics, that can make very good predictions, understands how the world works, but is not a person, has no goal, is just a really good model of the world, like a really smart encyclopedia.
This is only the starting point.
Can we build a machine that we totally trust and knows a lot, understands a lot, can reason and answer our questions like a perfect oracle?
It would be a probabilistic oracle, so it doesn't need to be certain about things.
It's not trying to please us.
It's just trying to be totally honest, which means it's going to give us numbers, the 10% probability, 100% probability, whatever.
Not 100% in general, like 50%, whatever.
Okay, so we can now use this as part of a system that actually acts in the world.
For example, companies already use what they call monitors, guardrails, so these pieces of code which sit on top of their neural net agent and checks that either the queries that the AI gets or the answers are kosher in some way, like it's not an answer about building a bomb or whatever.
The problem is these current guardrails don't work that great, but
To do the job of the guardrail, you don't need to have an AI that is an agent that has plans.
It just needs to be really good at predicting the consequences of actions.
You can ask it, what's the probability that this action, this output that the AI is about to produce is going to cause some categories of harm?
If the probability is above a threshold, you can just reject that action.
So right now we do get that in our interactions.
Sometimes the AI says, I'm sorry, I can't answer.
But we need that process to be a lot stronger.