Ryan Kidd
๐ค SpeakerAppearances Over Time
Podcast Appearances
But that's another matter.
I guess definitely AI systems, I would not feel confident in current or future gen or AI systems not doing out of distribution failures, let alone, like in critical scenarios, let alone
scary things like deception, you know, when it really counts.
And I think that's a big deal.
And we should be tracking two things, right?
Like one, AI capabilities.
So are AI situationally aware?
Like, do they have the necessary prerequisites to be able to even understand that they are this AI?
It seems like they do, right?
And even they even know kind of their training date, they know some details, they can detect their texts from other AI's texts.
So AI is becoming increasingly situationally aware of
which is one of the necessary prerequisites for really dangerous deception.
Do they have the capabilities to hack themselves out of the box, right?
To steal money, like we had this maths paper that came out and caused a stir recently.
It was this collaboration with Anthropic's AI Red team where they found that, oh, AI system, let's just put it in a simulated environment with a bunch of real smart contracts, and it can find $4.6 million worth of exploits.
Well, that's a lot of money.
That's enough to set up your own server and run for quite a while.
And that was a relatively short project.
And it was pretty hands-off as well from the humans, though not entirely.
So it does seem like we're getting dangerous capabilities, increasingly so.