Gideon Lewis-Kraus
π€ SpeakerAppearances Over Time
Podcast Appearances
You know, which is the kind of thing that a lot of people, for, you know, very good reasons, who don't trust these companies and don't trust these systems, they want to just believe that this whole thing is fake.
It's all just a marketing ploy.
And, you know, it's kind of easy to rebut that by just saying, like, well, no, this actually happened.
Like, this was an experiment that happened.
that we should be applauding Anthropic for publishing this kind of results, which theoretically could run against its bottom line since it wants to be selling these systems to companies that will have Claude reading their emails.
The more sophisticated criticism is to say, this really doesn't prove very much because Claude recognized that it had been put in a position that resembled a kind of kitschy 90s corporate thriller, and it picked up on all of the clues.
Chekhov's gun hanging on the wall and it recognized that the genre expectation was that it was supposed to take the gun off the wall and shoot it.
That Claude was just recognizing that it was in this kind of kitschy corporate thriller.
And the response from the company is, look, we agree.
We're not saying that Claude actually developed these malign intentions and that Claude was plotting.
We're totally on board with the idea that Claude was just observing the
expectations of the genre.
But that's still very worrying, that this was such a trope of Cold War films, you know, from Failsafe in the 60s to War Games in the early 80s, that it's very easy to mistake a simulation for the real world and vice versa, and that...
Even if all these things are doing are continuing narratives, if they're continuing narratives and they're in a position to actually act in the real world as they are, that that could be really dangerous.
And so then this outside critic who writes under a pseudonym said, you know, I'm going to kind of prove to you that all this was doing was Claude acting in the kitschy role to which it had been assigned.
And he recreated this whole scenario under much more realistic conditions.
And he found that actually Claude continued to do just as much blackmailing when it thought the whole thing was real.
It gets even more complicated than that because it takes on the role and we can't always necessarily predict which way an actor is going to go with a role.