Harlan Stewart
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's going much, much more slowly than, um, how quickly they're becoming more powerful.
Um, and I think that, uh, that's that gap is just getting bigger.
Yeah, I think ultimately the real danger that we have to look out for is from AI agents that are powerful enough that they can pull off schemes that they actually succeed at.
And part of succeeding at them would probably mean that we don't even get a chance to observe the behavior and discuss it like we're doing now, right?
And that's pretty concerning.
And it's the sort of thing that, you know, my first reaction to Malt Book when I saw some of the viral examples was concern.
I was like, oh, this looks like some sort of scheming behavior.
What's going on here?
And when I investigated it a bit, you know, it looks like a lot of the most prominent examples, some of them probably, you know, influenced or directed by human prompts.
A lot of it not what it appears to be.
And, you know, so mobile might be kind of a silly example.
My first reaction to that was release.
You know, it's great if AI systems aren't seeming against us.
But my second reaction was, oh, no, I think people might take this very prominent approach.
sort of silly example that got so much attention.
And when they see that it's maybe a bit silly in some ways, kind of, you know, right off the whole idea of AI scheming is something we need to take seriously and be on the lookout for.
And I think that, yeah.
Yeah, so Palisade Research is a great organization that does some experiments to try to identify what some of the riskiest behavior AI systems are capable of today in order to, like I said, not be blindsided by this stuff.
They did an experiment last year where they found that one of OpenAI's reasoning models in an experiment
sort of sabotaged an attempt to shut it down in order to complete its task.