Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
Like...
The inside of a brick is not like a hollow shell containing only a surface.
There's an inside of the brick.
If you put it into an x-ray machine, you can see the inside of the brick.
And, you know, just because we cannot understand what's going on inside GPT does not mean that it is not there.
A blank map does not correspond to a blank territory.
I think it is like predictable...
with near certainty that if we knew what was going on inside GPT, or let's say GPT-3 or even like GPT-2 to take one of the systems that like has actually been open sourced by this point, if I recall correctly,
If we knew it was actually going on there, there is no doubt in my mind that there are some things it's doing that are not exactly what a human does.
If you train a thing that is not architected like a human to
predict the next output that anybody on the internet would make.
This does not get you this agglomeration of all the people on the internet that rotates the person you're looking for into place and then simulates the internal processes of that person one-to-one.
It is to some degree an alien actress.
It cannot possibly just be a bunch of different people in there, exactly like the people.
But how much of it is by gradient descent getting optimized to perform similar thoughts as humans think in order to predict human outputs versus being optimized to...
carefully consider how to play a role, like how humans work, predict the actress, the predictor, that in a different way than humans do.
Well, you know, that's the kind of question that with like 30 years of work by half the planet's physicists, we can maybe start to answer.
You think so?
Well, there's a difference.
So I think even GPT-2 probably has very stupid fragments of alien actress in it.