Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
No matter what you think, don't make biological weapons.
No matter what you think, don't make child sexual material.
Those are like these hard rules.
But we operate very much at the level of principles.
Right.
It's like you're talking to a person.
I think I compared it to like if you have a parent who like dies and they like seal a letter that you read when you grow up.
It's a little bit like it's telling you who you should be and what what advice you should follow.
Yeah, this is one of these really hard-to-answer questions, right?
As much as every question you've asked me before this, as devilish a sociotechnical problem as it had been, we at least understand the factual basis of how to answer these questions.
This is something rather different.
We've taken a generally precautionary approach here.
We don't know if the models are conscious.
We're not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious.
But we're open to the idea that it could be.
And so we've taken...
certain measures to you know to to make sure that if we hypothesize that the models did have some morally relevant experience i don't know if i want to use the word conscious that that they do you know that they have a good experience so the first thing we did i think this was you know six months ago or so is we gave the models basically an i quit this job button
um where they can just press the i quit this job button and then they have to stop doing whatever the task is they very infrequently press that button i think it's it's usually around you know sorting through child sexualization material or like you know discussing something with you know a lot of gore blood and guts or something and you know similar to humans the models will just say no i i don't want to i don't want to do this um
Happens happens very rarely.
We're putting a lot of work into this field called interpretability, which is looking inside the brains of the models to try to understand what they're thinking.