Yoshua Bengio
๐ค PersonVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
There's also that extra layer we were talking about.
So those monitors, they're filtering the queries and the answers.
And if they detect that the AI is about to give information about how to build a bomb, they're supposed to stop it.
But again, even that layer is imperfect.
Recently, there was a series of cyber attacks that
by what looks like an organization that was state-sponsored that has used Anthropic's AI system
In other words, through the cloud, right?
It's not a private system.
They're using the system that is public.
And they used it to prepare and launch pretty serious cyber attacks.
So even though entropic system is supposed to prevent that, so it's trying to detect that somebody is trying to use their system for doing something illegal, those protections don't work well enough.
I hope so.
But can we count on that?
So actually, the data...
shows that it's been in the other direction.
So since those models have become better at reasoning, more or less about a year ago, they show more misaligned behavior, like bad behavior that goes against our instructions.
And we don't know for sure why, but one possibility is simply that now they can reason more.
That means they can strategize more.
That means if they have a goal that could be something we don't want, they're now more able to achieve it than they were previously.
They're also able to think of...