Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
There's a difference between the notion that the actress is somehow manipulative.
For example, GPT-3, I'm guessing, to whatever extent there's an alien actress in there versus something that mistakenly believes it's a human, as it were, while maybe not even being a person.
Yeah.
So the question of prediction via alien actress cogitating versus prediction via being isomorphic to the thing predicted is a spectrum.
And to whatever extent this is an alien actress, I'm not sure that there's a whole person alien actress with different goals from predicting the next step, being manipulative or anything like that.
That might be GPT-5.
Or GPT-6, even.
It's one of a bunch of things that change at different points.
I'm trying to get out ahead of the curve here, but if you imagine what the textbook from the future would say, if we'd actually been able to study this for 50 years without killing ourselves and without transcending, and you just imagine a wormhole opens and a textbook from that impossible world falls out, the textbook is not going to say, there is a single sharp threshold where everything changes.
It's going to be like, of course we know that best practices for aligning these systems must take into account the following...
like seven major thresholds of importance, which are passed at the following seven different points, is what the textbook is going to say.
The textbook isn't going to talk about big leaps, because big leaps are the way you think when you have a very simple scientific model of what's going on, where it's just like, all this stuff is there, or all this stuff is not there.
Or there's a single quantity and it's increasing linearly.
The textbook would say, well, and then GPT-3 had capability W, X, Y, and GPT-4 had capability Z1, Z2, and Z3.
Not in terms of what it can externally do, but in terms of internal machinery that started to be present.
It's just because we have no idea of what the internal machinery is that we are not already seeing like chunks of machinery appearing piece by piece as they no doubt have been.
We just don't know what they are.
Sure, but like humans having great leaps in their map, their understanding of the system is a very different concept from the system itself acquiring new chunks of machinery.
Oh, it's been like vastly exceeding the, yeah, the rate to which it's gaining capabilities is vastly overracing our ability to understand what's going on in there.
There is a whole team of developers there that also gets credit.