Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
Because unless you specifically want to end up in the state where you're looking out saying, I am here, I look out at this universe with wonder, if you don't want to preserve that, it doesn't get preserved when you grind really hard and be able to get more of the stuff.
We would choose to preserve that within ourselves because it matters and on some viewpoints is the only thing that matters.
I think the human alignment problem is a terrible phrase because it is very, very different to try to build systems out of humans, some of whom are nice and some of whom are not nice and some of whom are trying to trick you, and build a social system out of large populations of those who are all at basically the same level of intelligence.
Yes, IQ this, IQ that, but that versus chimpanzees.
It is very different to try to solve that problem than to try to build an AI from scratch, especially if, God help you, you are trying to use gradient descent on giant inscrutable matrices.
They're just very different problems, and I think that all the analogies between them are horribly misleading.
I don't think you are trying to do that on your first try.
I think on your first try, you are like trying to build an, you know, okay, like...
probably not what you should actually do but like let's say you were trying to build something that is like alpha fold 17 and you are trying to get it to solve the biology problems associated with making humans smarter so that humans can like actually solve alignment so you've got like a super biologist and you would like it to and I think what you would want in the situation is for it to like just be thinking about biology and not thinking about a very wide range of things that includes how to kill everybody and
And I think that the first AIs you're trying to build, not a million years later, the first ones, look more like narrowly specialized biologists than like
getting the full complexity and wonder of human experience in there in such a way that it wants to preserve itself, even as it becomes much smarter, which is a drastic system change.
It's going to have all kinds of side effects that, you know, like if we're dealing with giant inscrutable matrices, we're not very likely to be able to see coming in advance.
No, it's a shadow cast by humans on the internet.
But don't you think that shadow is a Jungian shadow?
I think that if you had alien super intelligences looking at the data, they would be able to pick up from it an excellent picture of what humans are actually like inside.
This does not mean that if you...
have a loss function of predicting the next token from that dataset, that the mind picked out by gradient descent to be able to predict the next token as well as possible on a very wide variety of humans is itself a human.
I think that if you sent me to a distant galaxy with aliens who are like much, much stupider than I am, so much so that I could do a pretty good job of predicting what they'd say, even though they thought in an utterly different way from how I did,
that I might in time be able to learn how to imitate those aliens if the intelligence gap was great enough that my own intelligence could overcome the alienness.
And the aliens would look at my outputs and say, is there not a deep name of alien nature to this thing?