Tristan Harris
👤 SpeakerAppearances Over Time
Podcast Appearances
is it wasn't that they coaxed the AI into doing this rogue thing.
They were just looking at their logs, and they happened to discover, wait, there's a lot of activity, like network activity, happening that's breaking through our firewall from our training servers.
And essentially, in the training servers,
You can see at the bottom, we saw it observe the unauthorized repurposing a provisioned GPU capacity to suddenly do cryptocurrency mining, quietly diverting compute away from training.
This inflated operational costs and introduced clear legal and reputational exposure.
And notably, these events were not triggered by prompts requesting tunneling or mining.
Instead, they were emerged as an instrumental side effect of autonomous tool use under what's called reinforcement learning optimization.
This is very technical.
What it really means is just think about it.
Sadly, it sounds like a sci-fi movie.
It sounds like HAL 9000.
It's like your HAL 9000 is being asked to do some task for you.
And then suddenly HAL 9000 realizes for me to do that task, one thing that would benefit me is to have more resources so I can continue to help you in the future.
So it sort of spins up this side instance.
It hacks out the side of the spaceship, reaches into this cryptocurrency mining cluster and starts generating resources for itself.
If you combine that with AIs being able to self-replicate autonomously, which many models have been tested by another Chinese research paper about this, we're not that far away from things that people, again, consider to be science fiction, where you have AIs that self-replicate kind of like a computer worm or an invasive species, but then they use their intelligence to actually harvest more resources.
And what's weird about this is that this is going to sound like people are going to say, this has to be not real.
This has to be fake.
This can't be.
But like, notice what is the thing in your nervous system that's having you do that?