Lily Jamali
đ€ SpeakerAppearances Over Time
Podcast Appearances
get completely out of control. And that's always sort of in the background when you talk to developers of these AI tools is that at what point, you know, do these systems become so powerful that they can basically go rogue? The human control is just no longer there. So what we know is that with each new release,
Much of this technology is getting cannier. There was actually this really interesting anecdote shared by a researcher at Anthropic, who talked about a real life example from his experience during the testing phase for Mithos. So he talked about how he had instructed the model, which was on a secured computer, it didn't have access to the internet. He had instructed it to try and escape, so to speak. Is this what's known as a sandbox?
Yes, exactly, a sandbox, which is a controlled testing environment. So this researcher instructed the model to effectively get out of the sandbox and see if it could try to escape. So he goes off, you know, goes to lunch. Apparently he's eating a sandwich in a park somewhere when he gets an email from the model, which is a scary sign that this experiment had worked, that the model had managed to escape the sandbox.
It shows you how good these models are getting. I think he was very surprised. I don't know if he ended up finishing his sandwich or not, but a lot of this stuff, it sounds funny, almost like a naughty teenager who is trying to escape from home and go out for the night or whatever, climb out of their bedroom window. But this is also pretty serious, right? Because it shows you the degree to which
NÀmÀ yritykset luovat tuotteita, jotka eivÀt ole tÀysin olemassa.
They're constantly safety testing them to get a sense of the scope of this. And there are examples of this from just in the last couple of years that we can point to. So one example that you sometimes hear about is OpenAI's chat GPT. There was a model that I believe they made it clear to the model that they were going to try to
and the model tried to copy itself in order to prevent that from happening. Almost as though it has a survival instinct, you're saying. Yeah, it seems to. I mean, when you say it has a survival instinct, we're kind of getting into pretty philosophical territory. Questions about how much intention, how much...
will do these models have, if any? And many would say they don't have any, that they are really just, they're just machines. They're not people, they don't have motivation. But when you see a model copying itself, maybe it's just reflecting back with the many human inputs that it took in to be trained. Maybe it's just reflecting back with
What all of that body of knowledge tells it to do in that circumstance? It's really hard to know. And companies like Anthropic, in fact, Anthropic specifically has an in-house philosopher whose whole job is to
He imbuerivÀt Claudea valintoihin. HeillÀ on Claude-konstituutio, joka tarkoittaa, ettÀ he imbuerivÀt modellin perusteella, mitÀ tehdÀ, millaista henkilökohtaa tÀmÀ asia on, millaisia valintoja tÀmÀ asia on. Ja luulen, ettÀ tuossa on se, ettÀ
I think that is absolutely part of it. I found it fascinating that any company would have an in-house philosopher, but her name is Amanda Askell, and I had an opportunity to actually talk to her back in January, right after they had released the most recent version of the Constitution.
I think there's a sense that they want Claude to be a model that will do the right thing when presented with a sticky request.
Sure. Well, Anthropic was started by this pair of siblings, Dario and Daniela Amode, who had been at rival OpenAI, which makes ChatGPT. And they left about five years ago to start their own company.
So they go off and start this new company. And marketing and branding is really important to anthropics messaging as it is with so many of these AI developers. About a year ago I was actually walking around San Francisco when I saw one of the ads for Anthropics Cloud and it said ethics was the first code we wrote.
Ja minÀ pysyin paikoillani ja otin kuvan siitÀ, koska ajattelin, ettÀ he ovat todella hienoja, kun nÀkevÀt sitÀ, miten he haluavat olla nÀhneet. He haluavat olla nÀhneet etheisellÀ AI-yhteisöllÀ. Ja kun verrataan Anthropic OpenAI, joka on Sam Altmanin jÀrjestöÀ,
Ja XAI, joka on Elon Muskin AI-yhteisö, he tekevÀt Grok, Grok-chatbot, joka on tullut kaikkiin haasteisiin. Kaikki nÀmÀ yritykset yrittÀvÀt erottaa itseÀÀn. Mutta jos olisit katsomassa sen sceptiivisen nÀkökulman, voit sanoa, ettÀ yritys kuten Anthropic haluaisi vÀlttÀmÀttÀ itseÀÀn olemaan etheellinen AI-yhteisö.
It might serve their PR purposes to say we've created something which is incredibly powerful, but we just want to be absolutely certain that it's safe before we release it into the wild. That might serve their PR purposes potentially. One hundred percent it serves their PR purposes. And I think their recent dust up, if you could call it that, with the Pentagon actually only reinforced that idea.
Back in February, Dario Amode, who is the CEO of Anthropic, was meeting with Pete Hegseth, the Secretary of Defense, now known as the Secretary of War here in the U.S.
This didn't go over very well with Pete Hegseth, who said basically you have a deadline to change your mind. We do have a statement from the Pentagon, and they're telling us that they are currently, quote, reviewing its relationship with anthropics, saying, quote, our nation requires that our partners be willing to help our warfighters win in any fight.
PÀivÀn pÀivÀn aikana Pete Hegseth kuitenkin sanoi, ettÀ olemme vÀhentÀmÀllÀ antropiikkia kansallisen turvallisuuden perusteella. Pentagon on vÀhentÀnyt yrityksen ja laittanut sen tuotantosuhteeseen. Presidentti sanoi truth socialin osalta, ettÀ minÀ edustan kaikkien yhdysvaltalaisen agenssien yhdysvaltalaisen hallituksen, jotta antropiikki-teknologian kÀyttö vÀhentÀÀ. MeillÀ ei tarvitse sitÀ, me emme halua sitÀ ja emme tehdÀ yritystÀ niiden kanssa taas.