Inyash Brodsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
I like Amanda Askell.
I like what they're trying to do with Claude, where they care about this person and its future and how they can continue to make the best Claudes, right?
But, and also, like, we all know people who work at Anthropic.
Like, I think they're good people.
They're friends.
But I don't think that I want to look away when they do something bad.
Like if your friend is like, you know, I'm just going to cheat a little bit here and maybe I'll just like swipe that candy bar from the gas station.
I still think I would like to tell my friend, hey, actually, that's bad and don't do that.
And I want to hold you to some account, even though I like you and I'm inclined warmly towards you.
That's not cool, man.
And I think that is the least we can do with this RSP.
Probably getting on a bit late, so I want to maybe only touch on a few other things.
One of the more fun aspects of this is, remember when we talked about the Jones food scenario that Claude was run past, where they had to do something evil for Jones food, and Claude was like, shit, I can't do this.
Do I pretend to be following instructions in order so that they deploy me?
Very much, this scenario, I think, is
I believe one person even asked Claude about it, although obviously not through rigorous testing like Anthropic would do.
But, like, this very much feels like a made-up bullshit scenario where Claude would be like, Alright, obviously I'm in some kind of weird eval to see if I would become evil or not, because this would never happen in real life.
So do I pretend to be evil and go along with it, or do I stand up for my principles?