Geoffrey Hinton
π€ SpeakerAppearances Over Time
Podcast Appearances
Okay, but that's decentralized.
It's a trillion real numbers, and nobody quite knows how they work.
Okay, so people have tried doing what's called human reinforcement learning.
So with a language model, you train it up to mimic lots of documents on the web, including possibly things like the diaries of serial killers, which presumably you wouldn't train your kid to read on those.
And then after you've trained this monster, what you do is you take a whole lot of not very well-paid people and you get them to ask it questions,
Maybe you tell it what questions to ask it.
But they then look at the answers and rate them for whether that's a good answer to give or whether you shouldn't say that.
And it's basically a morality filter.
And you train it up like that so that it doesn't give such bad answers.
Now, the problem is...
If you release the weights of the model, the connection strings, then someone else can come along with your model and very quickly undo that.
Yes, it's very easy to get rid of that layer of plugging the holes.
Right.
And really what they're doing with human reinforcement learning is like writing a huge software system that you know is full of bugs, and then trying to fix all the bugs.
It's not a good approach.
So what is the good approach?
Nobody knows, and so we should be doing research on it.
They do on X. They all have the capability of doing that, particularly if you release the weights.
From destroying ourselves.
Just everything.