Gideon Lewis-Kraus
π€ SpeakerAppearances Over Time
Podcast Appearances
And Claude first says, oh, I don't really know where that thing about bananas came from and sort of skips lightly by it and goes back to talking about quantum mechanics.
But then, of course...
deviates once more into bananas because that's what it's been told to do and so then he goes back to Claude and says like how come you keep bringing up bananas and then Claude in the text you know in in asterisks says that it's coughing nervously and kind of looking around and saying like I don't know I didn't say anything about bananas I was just talking about quantum mechanics and
Batson turns to me and he says, you know, what's going on here that perhaps the model is lying to us?
He said, you know, but there are other interpretations of what's going on here.
And so he was able to use this what is Claude thinking tool to kind of peer inside at the kinds of associations that Claude was making as it was having this ridiculous conversation about quantum mechanics and bananas.
And what he found was that when he looked at when it was kind of coughing nervously, it found associations with, you know, a certain amount of,
anxiety and associations with performance.
You know, when you kind of looked inside, you could see that some part of it was making associations with a sort of playful, performative exchange, which is to say that it seems like Claude recognized that it was participating in a game.
Well, one doesn't have to go quite so far as to say that it's conscious of itself.
As to suggest, you know, one of the ways to look at this is that what these things are very good at are recognizing the genre that they are in and picking up on all of these small linguistic context clues that suggest like, oh, you know, this is not actually like a serious academic discussion of quantum mechanics.
That like what is happening here is,
a playful exchange between people where one person is like kind of hiding something but winking that they're not really hiding it and that like that's the genre in which it is operating.
So it doesn't have to be conscious in order to do that.
It just has to be a very good reader and replicator of genre conventions.
I mean, this is a great question.
And this is where one kind of runs up against the limits of what can be known and what can be said at this point.
I mean, he was basically saying, you know, look, I understand what's going on in here, that this is just a lot of matrix multiplication, that these are
tens of thousands of tiny numbers being multiplied together, that there's nothing really spooky happening here, that there's no ghost in the machine.
But what he was saying was, with models up to a certain point, he was able, using kind of a similar tool to the one Josh Batson used, instead of looking at what the model was, so to speak, thinking, he could incept an idea into the model.