Boris
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so, yeah, in this case, so it renamed the receipts.
So I'm just going to open this up to double check.
Yeah, cool.
And so the receipts were named a little bit better organized.
And so maybe what I can try next is, let's put this in a spreadsheet.
Yeah, that's right.
We put so much work into safety and making sure that as this happens, you don't accidentally shoot yourself in the foot and delete files or whatever.
There's just a huge amount of work that went into this.
It starts at the model side, where for Anthropic, from the very beginning, we were the AI safety lab, and that's the reason that we exist.
And so there's a lot of work into like alignment and mechanistic interpretability and kind of all these ideas to make sure that the model does what you want in a way that's safe, kind of at the model layer.
And this literally means like studying the neurons, kind of the same way that you would study neurons in the human world.
And so you can identify structures and you can kind of study in a very scientific way as a black box also to make sure that it's safe.
So this is called alignment.
And then we do a whole bunch of other stuff.
So there's actually a whole virtual machine running under the hood.
And this is just to make sure that any actions taken are safe and don't affect your broader system.
And then as of last week, there's also deletion protection.
So if you accidentally delete something, then you're going to get prompted first.
So the model can kind of make sure that that's actually a thing that you want to do.
Obviously, also, as we start interacting with the Internet, something like prompt injection is quite scary.