Roman Yampolsky
👤 PersonPodcast Appearances
It's an option. I have a paper where I try to solve the value alignment problem for multiple agents. And the solution to avoid compromise is to give everyone a personal virtual universe. You can do whatever you want in that world. You could be king, you could be slave, you decide what happens.
It's an option. I have a paper where I try to solve the value alignment problem for multiple agents. And the solution to avoid compromise is to give everyone a personal virtual universe. You can do whatever you want in that world. You could be king, you could be slave, you decide what happens.
So it's basically a glorified video game where you get to enjoy yourself and someone else takes care of your needs and the substrate alignment is the only thing we need to solve. We don't have to get 8 billion humans to agree on anything.
So it's basically a glorified video game where you get to enjoy yourself and someone else takes care of your needs and the substrate alignment is the only thing we need to solve. We don't have to get 8 billion humans to agree on anything.
Some people say that's what happened. We're in a simulation.
Some people say that's what happened. We're in a simulation.
And some people choose to play on a more difficult level with more constraints. Some say, okay, I'm just going to enjoy the game, high privilege level. Absolutely.
And some people choose to play on a more difficult level with more constraints. Some say, okay, I'm just going to enjoy the game, high privilege level. Absolutely.
Personal universes. Personal universes.
Personal universes. Personal universes.
In order to solve value alignment problem, I'm trying to formalize it a little better. Usually, we're talking about getting AIs to do what we want, which is not well-defined. We're talking about creator of the system, owner of that AI, humanity as a whole, but we don't agree on much. There is no universally accepted ethics, morals across cultures, religions.
In order to solve value alignment problem, I'm trying to formalize it a little better. Usually, we're talking about getting AIs to do what we want, which is not well-defined. We're talking about creator of the system, owner of that AI, humanity as a whole, but we don't agree on much. There is no universally accepted ethics, morals across cultures, religions.
People have individually very different preferences politically and such. So even if we somehow managed all the other aspects of it, programming those fuzzy concepts in, getting AI to follow them closely, we don't agree on what to program in. So my solution was, okay, we don't have to compromise on room temperature. You have your universe, I have mine. whatever you want.
People have individually very different preferences politically and such. So even if we somehow managed all the other aspects of it, programming those fuzzy concepts in, getting AI to follow them closely, we don't agree on what to program in. So my solution was, okay, we don't have to compromise on room temperature. You have your universe, I have mine. whatever you want.
And if you like me, you can invite me to visit your universe. We don't have to be independent, but the point is you can be. And virtual reality is getting pretty good. It's going to hit a point where you can't tell the difference. And if you can't tell if it's real or not, what's the difference?
And if you like me, you can invite me to visit your universe. We don't have to be independent, but the point is you can be. And virtual reality is getting pretty good. It's going to hit a point where you can't tell the difference. And if you can't tell if it's real or not, what's the difference?
You still have to align with that individual. They have to be happy in that simulation. But it's a much easier problem to align with one agent versus 8 billion agents plus animals, aliens.
You still have to align with that individual. They have to be happy in that simulation. But it's a much easier problem to align with one agent versus 8 billion agents plus animals, aliens.
I'm trying to do that, yeah.
I'm trying to do that, yeah.
It seems contradictory. I haven't seen anyone explain what it means outside of kind of words which pack a lot, make it good, make it desirable, make it something they don't regret. But how do you specifically formalize those notions? How do you program them in? I haven't seen anyone make progress on that so far.
It seems contradictory. I haven't seen anyone explain what it means outside of kind of words which pack a lot, make it good, make it desirable, make it something they don't regret. But how do you specifically formalize those notions? How do you program them in? I haven't seen anyone make progress on that so far.
Right. But the examples you gave, some of them are, for example, two different religions saying this is our holy site and we are not willing to compromise it in any way. If you can make two holy sites in virtual worlds, you solve the problem. But if you only have one, it's not divisible. You're kind of stuck there.
Right. But the examples you gave, some of them are, for example, two different religions saying this is our holy site and we are not willing to compromise it in any way. If you can make two holy sites in virtual worlds, you solve the problem. But if you only have one, it's not divisible. You're kind of stuck there.
If we go back to that idea of simulation and this is entertainment kind of giving meaning to us, the question is how much suffering is reasonable for a video game? So yeah, I don't mind a video game where I get haptic feedback, there is a little bit of shaking, maybe I'm a little scared. I don't want a game where kids are tortured, literally. That seems unethical, at least by our human standards.
If we go back to that idea of simulation and this is entertainment kind of giving meaning to us, the question is how much suffering is reasonable for a video game? So yeah, I don't mind a video game where I get haptic feedback, there is a little bit of shaking, maybe I'm a little scared. I don't want a game where kids are tortured, literally. That seems unethical, at least by our human standards.
So we know there are some humans who, because of a mutation, don't experience physical pain. So at least physical pain can be mutated out, re-engineered out. Suffering in terms of meaning, like you burn the only copy of my book, is a little harder. But even there, you can manipulate your hedonic set point, you can change defaults, you can reset.
So we know there are some humans who, because of a mutation, don't experience physical pain. So at least physical pain can be mutated out, re-engineered out. Suffering in terms of meaning, like you burn the only copy of my book, is a little harder. But even there, you can manipulate your hedonic set point, you can change defaults, you can reset.
Problem with that is if you start messing with your reward channel, you start wireheading and end up bleacing out a little too much.
Problem with that is if you start messing with your reward channel, you start wireheading and end up bleacing out a little too much.
I think we need that, but I would change the overall range. So right now it's negative infinity to kind of positive infinity, pain-pleasure axis. I would make it like zero to positive infinity. And being unhappy is like, I'm close to zero.
I think we need that, but I would change the overall range. So right now it's negative infinity to kind of positive infinity, pain-pleasure axis. I would make it like zero to positive infinity. And being unhappy is like, I'm close to zero.
So there are many malevolent actors. We can talk about psychopaths, crazies, hackers, doomsday cults. We know from history they tried killing everyone. They tried on purpose to cause maximum amount of damage, terrorism. What if someone malevolent wants on purpose to torture all humans as long as possible?
So there are many malevolent actors. We can talk about psychopaths, crazies, hackers, doomsday cults. We know from history they tried killing everyone. They tried on purpose to cause maximum amount of damage, terrorism. What if someone malevolent wants on purpose to torture all humans as long as possible?
You solve aging, so now you have functional immortality, and you just try to be as creative as you can.
You solve aging, so now you have functional immortality, and you just try to be as creative as you can.
So there are different malevolent agents. Some maybe just gaining personal benefit and sacrificing others to that cause. Others, we know for a fact, are trying to kill as many people as possible. And we look at recent school shootings. If they had more capable weapons, they would take out not dozens, but thousands, millions, billions.
So there are different malevolent agents. Some maybe just gaining personal benefit and sacrificing others to that cause. Others, we know for a fact, are trying to kill as many people as possible. And we look at recent school shootings. If they had more capable weapons, they would take out not dozens, but thousands, millions, billions.
There is mental diseases where people don't have empathy, don't have this human quality of understanding suffering in ours.
There is mental diseases where people don't have empathy, don't have this human quality of understanding suffering in ours.
Again, I would like to assume that normal people never think like that. It's always some sort of psychopaths, but yeah.
Again, I would like to assume that normal people never think like that. It's always some sort of psychopaths, but yeah.
They can certainly be more creative. They can understand human biology better, understand our molecular structure, genome. Again, a lot of times torture ends and the individual dies. That limit can be removed as well.
They can certainly be more creative. They can understand human biology better, understand our molecular structure, genome. Again, a lot of times torture ends and the individual dies. That limit can be removed as well.
Right. We can definitely keep up for a while. I'm saying you cannot do it indefinitely. At some point, the cognitive gap is too big. The surface you have to defend is infinite. But attackers only need to find one exploit.
Right. We can definitely keep up for a while. I'm saying you cannot do it indefinitely. At some point, the cognitive gap is too big. The surface you have to defend is infinite. But attackers only need to find one exploit.
If we create general super intelligences, I don't see a good outcome long-term for humanity. The only way to win this game is not to play it.
If we create general super intelligences, I don't see a good outcome long-term for humanity. The only way to win this game is not to play it.
I don't know for sure. The prediction markets right now are saying 2026 for AGI. I heard the same thing from CEO of Anthropic, DeepMind, so maybe we are two years away, which seems very soon given we don't have a working safety mechanism in place or even a prototype for one. And there are people trying to accelerate those timelines because they feel we're not getting there quick enough.
I don't know for sure. The prediction markets right now are saying 2026 for AGI. I heard the same thing from CEO of Anthropic, DeepMind, so maybe we are two years away, which seems very soon given we don't have a working safety mechanism in place or even a prototype for one. And there are people trying to accelerate those timelines because they feel we're not getting there quick enough.
So the definitions we used to have, and people are modifying them a little bit lately. Artificial general intelligence was a system capable of performing in any domain a human could perform. So kind of you're creating this average artificial person. They can do cognitive labor, physical labor, where you can get another human to do it.
So the definitions we used to have, and people are modifying them a little bit lately. Artificial general intelligence was a system capable of performing in any domain a human could perform. So kind of you're creating this average artificial person. They can do cognitive labor, physical labor, where you can get another human to do it.
Superintelligence was defined as a system which is superior to all humans in all domains. Now people are starting to refer to AGI as if it's superintelligence. I made a post recently where I argued, for me at least, if you average out over all the common human tasks, those systems are already smarter than an average human. So under that definition, we have it.
Superintelligence was defined as a system which is superior to all humans in all domains. Now people are starting to refer to AGI as if it's superintelligence. I made a post recently where I argued, for me at least, if you average out over all the common human tasks, those systems are already smarter than an average human. So under that definition, we have it.
Shane Lake has this definition of where you're trying to win in all domains. That's what intelligence is. Now, are they smarter than elite individuals in certain domains? Of course not. They're not there yet. But the progress is exponential.
Shane Lake has this definition of where you're trying to win in all domains. That's what intelligence is. Now, are they smarter than elite individuals in certain domains? Of course not. They're not there yet. But the progress is exponential.
That would be enough to bootstrap the whole process.
That would be enough to bootstrap the whole process.
Human level is general in the domain of expertise of humans. We know how to do human things. I don't speak dog language. I should be able to pick it up if I'm a general intelligence. It's kind of inferior animal. I should be able to learn that skill, but I can't. A general intelligence, truly universal general intelligence, should be able to do things like that humans cannot do.
Human level is general in the domain of expertise of humans. We know how to do human things. I don't speak dog language. I should be able to pick it up if I'm a general intelligence. It's kind of inferior animal. I should be able to learn that skill, but I can't. A general intelligence, truly universal general intelligence, should be able to do things like that humans cannot do.
To solve pattern recognition problems of that type, to do other similar things outside of our domain of expertise, because it's just not the world we live in.
To solve pattern recognition problems of that type, to do other similar things outside of our domain of expertise, because it's just not the world we live in.
We know calculators are smarter than humans in that narrow domain of addition.
We know calculators are smarter than humans in that narrow domain of addition.
So when I think about it, I usually think human with a paper and a pencil, not human with internet and other AI helping.
So when I think about it, I usually think human with a paper and a pencil, not human with internet and other AI helping.
But we create AI. So at any point, you'll still just add superintelligence to human capability? That seems like cheating.
But we create AI. So at any point, you'll still just add superintelligence to human capability? That seems like cheating.
It seems like a hybrid of some kind. You're now doing brain-computer interfaces. You're connecting it to maybe narrow AIs. Yeah, it definitely increases our capabilities.
It seems like a hybrid of some kind. You're now doing brain-computer interfaces. You're connecting it to maybe narrow AIs. Yeah, it definitely increases our capabilities.
I am old fashioned. I like Turing test. I have a paper where I equate passing Turing test to solving AI complete problems, because you can encode any questions about any domain into the Turing test. You don't have to talk about how was your day? You can ask anything. And so the system has to be as smart as a human to pass it in a true sense.
I am old fashioned. I like Turing test. I have a paper where I equate passing Turing test to solving AI complete problems, because you can encode any questions about any domain into the Turing test. You don't have to talk about how was your day? You can ask anything. And so the system has to be as smart as a human to pass it in a true sense.
It has to be long enough to where you can make some meaningful decisions about capabilities, absolutely. You can brute force very short conversations.
It has to be long enough to where you can make some meaningful decisions about capabilities, absolutely. You can brute force very short conversations.
For AGI, it has to be there. I cannot give it a task I can give to a human, and it cannot do it if a human can. For superintelligence, it would be superior on all such tasks, not just average performance. Go learn to drive a car. Go speak Chinese. Play guitar. Okay, great.
For AGI, it has to be there. I cannot give it a task I can give to a human, and it cannot do it if a human can. For superintelligence, it would be superior on all such tasks, not just average performance. Go learn to drive a car. Go speak Chinese. Play guitar. Okay, great.
You can develop a test which will give you positives if it lies to you or has those ideas. You cannot develop a test which rules them out. There is always possibility of what Bostrom calls a treacherous turn, where later on a system decides for game theoretic reasons, economic reasons to change its behavior. And we see the same with humans. It's not unique to AI.
You can develop a test which will give you positives if it lies to you or has those ideas. You cannot develop a test which rules them out. There is always possibility of what Bostrom calls a treacherous turn, where later on a system decides for game theoretic reasons, economic reasons to change its behavior. And we see the same with humans. It's not unique to AI.
For millennia, we tried developing morals, ethics, religions, lie detector tests, and then employees betray the employers, spouses betray family. It's a pretty standard thing intelligent agents sometimes do.
For millennia, we tried developing morals, ethics, religions, lie detector tests, and then employees betray the employers, spouses betray family. It's a pretty standard thing intelligent agents sometimes do.
If you know the truth and it tells you something false, you can detect that, but you cannot know in general every single time. And again, the system you're testing today may not be lying, The system you're testing today may know you are testing it and so behaving.
If you know the truth and it tells you something false, you can detect that, but you cannot know in general every single time. And again, the system you're testing today may not be lying, The system you're testing today may know you are testing it and so behaving.
And later on, after it interacts with the environment, interacts with other systems, malevolent agents, learns more, it may start doing those things.
And later on, after it interacts with the environment, interacts with other systems, malevolent agents, learns more, it may start doing those things.
So systems today don't have long-term planning. That is not our, they can lie today if it optimizes, helps them optimize the reward. If they realize, okay, this human will be very happy if I tell them the following, they will do it if it brings them more points. And they don't have to kind of keep track of it. It's just the right answer to this problem every single time.
So systems today don't have long-term planning. That is not our, they can lie today if it optimizes, helps them optimize the reward. If they realize, okay, this human will be very happy if I tell them the following, they will do it if it brings them more points. And they don't have to kind of keep track of it. It's just the right answer to this problem every single time.
Well, some people think that if they're that smart, they're always good. They really do believe that. It's just benevolence from intelligence. So they'll always want what's best for us. Some people think that they will be able to detect problem behaviors and correct them at the time when we get there. I don't think it's a good idea. I am strongly against it.
Well, some people think that if they're that smart, they're always good. They really do believe that. It's just benevolence from intelligence. So they'll always want what's best for us. Some people think that they will be able to detect problem behaviors and correct them at the time when we get there. I don't think it's a good idea. I am strongly against it.
But yeah, there are quite a few people who, in general, are so optimistic about this technology, it could do no wrong. They want it developed as soon as possible, as capable as possible.
But yeah, there are quite a few people who, in general, are so optimistic about this technology, it could do no wrong. They want it developed as soon as possible, as capable as possible.
There are even people who say, okay, what's so special about humans, right? We removed the gender bias. We're removing race bias. Why is this pro-human bias? We are polluting the planet. We are, as you said, you know, fight a lot of wars, kind of violent. Maybe it's better if a super intelligent, perfect society comes and replaces us. It's normal stage in the evolution of our species.
There are even people who say, okay, what's so special about humans, right? We removed the gender bias. We're removing race bias. Why is this pro-human bias? We are polluting the planet. We are, as you said, you know, fight a lot of wars, kind of violent. Maybe it's better if a super intelligent, perfect society comes and replaces us. It's normal stage in the evolution of our species.
I cannot make a case that he's right. He's wrong in so many ways, it's difficult for me to remember all of them. He's a Facebook buddy, so I have a lot of fun having those little debates with him. So I'm trying to remember the arguments. So one, he says we are not... gifted this intelligence from aliens. We are designing it, we are making decisions about it. That's not true.
I cannot make a case that he's right. He's wrong in so many ways, it's difficult for me to remember all of them. He's a Facebook buddy, so I have a lot of fun having those little debates with him. So I'm trying to remember the arguments. So one, he says we are not... gifted this intelligence from aliens. We are designing it, we are making decisions about it. That's not true.
It was true when we had expert systems, symbolic AI, decision trees. Today, you set up parameters for a model and you water this plant. You give it data, you give it compute, and it grows. And after it's finished growing into this alien plant, you start testing it to find out what capabilities it has. And it takes years to figure out, even for existing models.
It was true when we had expert systems, symbolic AI, decision trees. Today, you set up parameters for a model and you water this plant. You give it data, you give it compute, and it grows. And after it's finished growing into this alien plant, you start testing it to find out what capabilities it has. And it takes years to figure out, even for existing models.
If it's trained for six months, it will take you two, three years to figure out basic capabilities of that system. We still discover new capabilities in systems which are already out there. So that's not the case.
If it's trained for six months, it will take you two, three years to figure out basic capabilities of that system. We still discover new capabilities in systems which are already out there. So that's not the case.
Absolutely. That's what makes it so successful. Then we had to painstakingly hard code in everything. We didn't have much progress. Now, just spend more money and more compute and it's a lot more capable.
Absolutely. That's what makes it so successful. Then we had to painstakingly hard code in everything. We didn't have much progress. Now, just spend more money and more compute and it's a lot more capable.
Let's say there is a ceiling. It's not guaranteed to be at the level which is competitive with us. It may be greatly superior to ours.
Let's say there is a ceiling. It's not guaranteed to be at the level which is competitive with us. It may be greatly superior to ours.
Historically, he's completely right. Open source software is wonderful. It's tested by the community. It's debugged, but we're switching from tools to agents. Now you're giving open source weapons to psychopaths. Do we want to open source nuclear weapons? biological weapons.
Historically, he's completely right. Open source software is wonderful. It's tested by the community. It's debugged, but we're switching from tools to agents. Now you're giving open source weapons to psychopaths. Do we want to open source nuclear weapons? biological weapons.
It's not safe to give technology so powerful to those who may misalign it, even if you are successful at somehow getting it to work in the first place in a friendly manner.
It's not safe to give technology so powerful to those who may misalign it, even if you are successful at somehow getting it to work in the first place in a friendly manner.
It also sets a very wrong precedent. So we open sourced model one, model two, model three, nothing ever bad happened. So obviously we're gonna do it with model four. It's just gradual improvement.
It also sets a very wrong precedent. So we open sourced model one, model two, model three, nothing ever bad happened. So obviously we're gonna do it with model four. It's just gradual improvement.
So I have a paper which collects accidents through history of AI, and they always are proportional to capabilities of that system. So if you have tic-tac-toe playing AI, it will fail to properly play and loses the game which it should draw. Trivial. Your spell checker will misspell a word, so on.
So I have a paper which collects accidents through history of AI, and they always are proportional to capabilities of that system. So if you have tic-tac-toe playing AI, it will fail to properly play and loses the game which it should draw. Trivial. Your spell checker will misspell a word, so on.
I stopped collecting those because there are just too many examples of AIs failing at what they are capable of. We haven't had... terrible accidents in the sense of billion people get killed. Absolutely true. But in another paper, I argue that those accidents do not actually prevent people from continuing with research. And actually, they kind of serve like vaccines.
I stopped collecting those because there are just too many examples of AIs failing at what they are capable of. We haven't had... terrible accidents in the sense of billion people get killed. Absolutely true. But in another paper, I argue that those accidents do not actually prevent people from continuing with research. And actually, they kind of serve like vaccines.
A vaccine makes your body a little bit sick, so you can handle the big disease later much better. It's the same here. People will point out, you know that accident, AI accident we had where 12 people died? Everyone's still here. 12 people is less than smoking kills. It's not a big deal. So we continue. So in a way, it will actually be kind of confirming that it's not that bad.
A vaccine makes your body a little bit sick, so you can handle the big disease later much better. It's the same here. People will point out, you know that accident, AI accident we had where 12 people died? Everyone's still here. 12 people is less than smoking kills. It's not a big deal. So we continue. So in a way, it will actually be kind of confirming that it's not that bad.
So you bring up example of cars. Yes, cars were slowly developed and integrated. If we had no cars, and somebody came around and said, I invented this thing. It's called cars. It's awesome. It kills like 100,000 Americans every year. Let's deploy it. Would we deploy that?
So you bring up example of cars. Yes, cars were slowly developed and integrated. If we had no cars, and somebody came around and said, I invented this thing. It's called cars. It's awesome. It kills like 100,000 Americans every year. Let's deploy it. Would we deploy that?
You need data. You need to know. But if I'm right and it's unpredictable, unexplainable, uncontrollable, you cannot make this decision, we're gaining $10 trillion of wealth, but we're losing, we don't know how many people. You basically have to perform an experiment on 8 billion humans without their consent.
You need data. You need to know. But if I'm right and it's unpredictable, unexplainable, uncontrollable, you cannot make this decision, we're gaining $10 trillion of wealth, but we're losing, we don't know how many people. You basically have to perform an experiment on 8 billion humans without their consent.
And even if they want to give you consent, they can't because they cannot give informed consent. They don't understand those things.
And even if they want to give you consent, they can't because they cannot give informed consent. They don't understand those things.
We're literally doing it. The previous model we learned about after we finished training it, what it was capable of. Let's say we stop GPT-4 training run around human capability, hypothetically. We start training GPT-5, and I have no knowledge of insider training runs or anything. And we start at that point of about human, and we train it for the next nine months.
We're literally doing it. The previous model we learned about after we finished training it, what it was capable of. Let's say we stop GPT-4 training run around human capability, hypothetically. We start training GPT-5, and I have no knowledge of insider training runs or anything. And we start at that point of about human, and we train it for the next nine months.
Maybe two months in, it becomes super intelligent. We continue training it. At the time when we start testing it, It is already a dangerous system. How dangerous? I have no idea. But neither people training it.
Maybe two months in, it becomes super intelligent. We continue training it. At the time when we start testing it, It is already a dangerous system. How dangerous? I have no idea. But neither people training it.
If we had capability of ahead of the run, before the training run, to register exactly what capabilities that next model will have at the end of the training run, and we accurately guessed all of them, I would say, you're right, we can definitely go ahead with this run. We don't have that capability.
If we had capability of ahead of the run, before the training run, to register exactly what capabilities that next model will have at the end of the training run, and we accurately guessed all of them, I would say, you're right, we can definitely go ahead with this run. We don't have that capability.
We're not talking just about capabilities, specific tasks. We're talking about general capability to learn. Maybe like a child at the time of testing and deployment, it is still not extremely capable, but as it is exposed to more data, real world, it can be trained to become much more dangerous and capable.
We're not talking just about capabilities, specific tasks. We're talking about general capability to learn. Maybe like a child at the time of testing and deployment, it is still not extremely capable, but as it is exposed to more data, real world, it can be trained to become much more dangerous and capable.
So I think at some point it becomes capable of getting out of control. For game theoretic reasons, it may decide not to do anything right away and for a long time just collect more resources, accumulate strategic advantage. Right away, it may be kind of still young, weak superintelligence. Give it a decade, it's in charge of a lot more resources. It had time to make backups.
So I think at some point it becomes capable of getting out of control. For game theoretic reasons, it may decide not to do anything right away and for a long time just collect more resources, accumulate strategic advantage. Right away, it may be kind of still young, weak superintelligence. Give it a decade, it's in charge of a lot more resources. It had time to make backups.
So it's not obvious to me that it will strike as soon as it can.
So it's not obvious to me that it will strike as soon as it can.
We've been doing it for years. Software controls all the systems, nuclear power plants, airline industry, it's all software-based. Every time there is electrical outage, I can't fly anywhere for days.
We've been doing it for years. Software controls all the systems, nuclear power plants, airline industry, it's all software-based. Every time there is electrical outage, I can't fly anywhere for days.
No, but if it shows it is safer, in fact, when it's in control, we get better results, people will demand that it was put in place.
No, but if it shows it is safer, in fact, when it's in control, we get better results, people will demand that it was put in place.
And if not, it can hack the system. It can use social engineering to get access to it. That's why I said it might take some time for it to accumulate those resources.
And if not, it can hack the system. It can use social engineering to get access to it. That's why I said it might take some time for it to accumulate those resources.
I really hope you're right, but it's not what I'm seeing. People are very quick to jump on a latest trend. Early adopters will be there before it's even deployed buying prototypes.
I really hope you're right, but it's not what I'm seeing. People are very quick to jump on a latest trend. Early adopters will be there before it's even deployed buying prototypes.
So we've been deploying systems which had hidden capabilities.
So we've been deploying systems which had hidden capabilities.
GPT-4. I don't know what else it's capable of, but there are still things we haven't discovered it can do. They may be trivial proportionate to its capability. I don't know. It writes Chinese poetry, hypothetical. I know it does. But we haven't tested for all possible capabilities, and we are not explicitly designing them. We can only rule out bugs we find.
GPT-4. I don't know what else it's capable of, but there are still things we haven't discovered it can do. They may be trivial proportionate to its capability. I don't know. It writes Chinese poetry, hypothetical. I know it does. But we haven't tested for all possible capabilities, and we are not explicitly designing them. We can only rule out bugs we find.
We cannot rule out bugs and capabilities because we haven't found them.
We cannot rule out bugs and capabilities because we haven't found them.
Again, we can only ask and test for things we know about. If there are unknown unknowns, we cannot do it. I'm thinking of human statistics events, right? If you talk to a person like that, you may not even realize they can multiply 20-digit numbers in their head. You have to know to ask.
Again, we can only ask and test for things we know about. If there are unknown unknowns, we cannot do it. I'm thinking of human statistics events, right? If you talk to a person like that, you may not even realize they can multiply 20-digit numbers in their head. You have to know to ask.
So two things. One, we're switching from tools to agents. Tools don't have negative or positive impact. People using tools do. So guns don't kill. People with guns do. Agents can make their own decisions. They can be positive or negative. A pit bull can decide to harm you as an agent. The fears are the same. The only difference is now we have this technology.
So two things. One, we're switching from tools to agents. Tools don't have negative or positive impact. People using tools do. So guns don't kill. People with guns do. Agents can make their own decisions. They can be positive or negative. A pit bull can decide to harm you as an agent. The fears are the same. The only difference is now we have this technology.
Then they were afraid of humanoid robots 100 years ago. They had none. Today, every major company in the world is investing billions to create them. Not every, but you understand what I'm saying? It's very different.
Then they were afraid of humanoid robots 100 years ago. They had none. Today, every major company in the world is investing billions to create them. Not every, but you understand what I'm saying? It's very different.
They are saying they are building super intelligence and have a super alignment team. You don't think they are trying to create a system smart enough to be an independent agent under that definition?
They are saying they are building super intelligence and have a super alignment team. You don't think they are trying to create a system smart enough to be an independent agent under that definition?
Those systems are well beyond narrow AI. If you had to list all the capabilities of GPT-4, you would spend a lot of time writing that list.
Those systems are well beyond narrow AI. If you had to list all the capabilities of GPT-4, you would spend a lot of time writing that list.
Not yet. But do you think any of those companies are holding back because they think it may be not safe or are they developing the most capable system they can given the resources and hoping they can control and monetize?
Not yet. But do you think any of those companies are holding back because they think it may be not safe or are they developing the most capable system they can given the resources and hoping they can control and monetize?
I mean, I can't speak for other people. For all of them, I think some of them are very ambitious. They fundraise in trillions. They talk about controlling the light corner of the universe. I would guess that they might.
I mean, I can't speak for other people. For all of them, I think some of them are very ambitious. They fundraise in trillions. They talk about controlling the light corner of the universe. I would guess that they might.
I really hope you're right. I think the scaling hypothesis is correct. We haven't seen diminishing returns. It used to be we asked how long before AGI. Now we should ask how much until AGI. It's a trillion dollars today. It's a billion dollars next year. It's a million dollars in a few years.
I really hope you're right. I think the scaling hypothesis is correct. We haven't seen diminishing returns. It used to be we asked how long before AGI. Now we should ask how much until AGI. It's a trillion dollars today. It's a billion dollars next year. It's a million dollars in a few years.
Compute gets cheaper every day, exponentially.
Compute gets cheaper every day, exponentially.
If the only disagreement is that it will take decades, not years, for everything I'm saying to materialize, then I can go with that.
If the only disagreement is that it will take decades, not years, for everything I'm saying to materialize, then I can go with that.
There is a lot to unpack here. There is a partnership on AI, a conglomerate of many large corporations. They have a database of AI accidents they collect. I contributed a lot to that database.
There is a lot to unpack here. There is a partnership on AI, a conglomerate of many large corporations. They have a database of AI accidents they collect. I contributed a lot to that database.
If we so far made almost no progress in actually solving this problem, not patching it, not, again, lipstick on the pick kind of solutions, why would we think we'll do better than we're closer to the problem?
If we so far made almost no progress in actually solving this problem, not patching it, not, again, lipstick on the pick kind of solutions, why would we think we'll do better than we're closer to the problem?
Again, I want to be perfectly clear. I love AI. I love technology. I'm a computer scientist. I have a PhD in engineering. I work at an engineering school. There is a huge difference between we need to develop narrow AI systems, super intelligent in solving specific human problems like protein folding, and let's create super intelligent machine guarded and we'll decide what to do with us.
Again, I want to be perfectly clear. I love AI. I love technology. I'm a computer scientist. I have a PhD in engineering. I work at an engineering school. There is a huge difference between we need to develop narrow AI systems, super intelligent in solving specific human problems like protein folding, and let's create super intelligent machine guarded and we'll decide what to do with us.
Those are not the same. I am against the super intelligence in general sense with no undo button.
Those are not the same. I am against the super intelligence in general sense with no undo button.
Partially, but they don't scale. For narrow AI, for deterministic systems, you can test them. You have edge cases. You know what the answer should look like. You know the right answers. For general systems, you have infinite test surface. You have no edge cases. You cannot even know what to test for. Again, the unknown unknowns are underappreciated by... people looking at this problem.
Partially, but they don't scale. For narrow AI, for deterministic systems, you can test them. You have edge cases. You know what the answer should look like. You know the right answers. For general systems, you have infinite test surface. You have no edge cases. You cannot even know what to test for. Again, the unknown unknowns are underappreciated by... people looking at this problem.
You are always asking me, how will it kill everyone? How will it will fail? The whole point is, if I knew it, I would be super intelligent, and despite what you might think, I'm not.
You are always asking me, how will it kill everyone? How will it will fail? The whole point is, if I knew it, I would be super intelligent, and despite what you might think, I'm not.
It is a master at deception. Sam tweeted about how great it is at persuasion. And we see it ourselves, especially now with voices, with maybe kind of flirty, sarcastic female voices. It's gonna be very good at getting people to do things.
It is a master at deception. Sam tweeted about how great it is at persuasion. And we see it ourselves, especially now with voices, with maybe kind of flirty, sarcastic female voices. It's gonna be very good at getting people to do things.
Right. I don't think developers know everything about what they are creating. They have lots of great knowledge. We're making progress on explaining parts of a network. We can understand, okay, this node gets excited when this input is presented, this cluster of nodes. But we're nowhere near close to understanding the full picture, and I think it's impossible.
Right. I don't think developers know everything about what they are creating. They have lots of great knowledge. We're making progress on explaining parts of a network. We can understand, okay, this node gets excited when this input is presented, this cluster of nodes. But we're nowhere near close to understanding the full picture, and I think it's impossible.
You need to be able to survey an explanation. The size of those models prevents a single human from observing all this information, even if provided by the system. So either we're getting model as an explanation for what's happening, and that's not comprehensible to us, or we're getting a compressed explanation, lossy compression, where here's top 10 reasons you got fired.
You need to be able to survey an explanation. The size of those models prevents a single human from observing all this information, even if provided by the system. So either we're getting model as an explanation for what's happening, and that's not comprehensible to us, or we're getting a compressed explanation, lossy compression, where here's top 10 reasons you got fired.
It's something, but it's not a full picture.
It's something, but it's not a full picture.
So there is a paper, I think it came out last week by Dr. Park et al from MIT, I think, and they showed that existing models already showed successful deception in what they do. My concern is not that they lie now and we need to catch them and tell them don't lie. My concern is that once they are capable and deployed, they will later change their mind because that's what
So there is a paper, I think it came out last week by Dr. Park et al from MIT, I think, and they showed that existing models already showed successful deception in what they do. My concern is not that they lie now and we need to catch them and tell them don't lie. My concern is that once they are capable and deployed, they will later change their mind because that's what
unrestricted learning allows you to do. Lots of people grow up maybe in the religious family. They read some new books and they turn in their religion. That's a treacherous turn in humans. If you learn something new about your colleagues, maybe you'll change how you react to them.
unrestricted learning allows you to do. Lots of people grow up maybe in the religious family. They read some new books and they turn in their religion. That's a treacherous turn in humans. If you learn something new about your colleagues, maybe you'll change how you react to them.
And you can't say they are not rational. The rational decision changes based on your position. Then you are under the boss. The rational policy may be to be following orders and being honest. When you become a boss, rational policy may shift.
And you can't say they are not rational. The rational decision changes based on your position. Then you are under the boss. The rational policy may be to be following orders and being honest. When you become a boss, rational policy may shift.
The robots are coming. There's a refrigerator making a buzzing noise. Very menacing, very menacing. So every time I'm about to talk about this topic, things start to happen. My flight yesterday was canceled without possibility to rebook. I was giving a talk at Google in Israel and three cars, which were supposed to take me to the talk, could not. I'm just saying. I like AIs.
The robots are coming. There's a refrigerator making a buzzing noise. Very menacing, very menacing. So every time I'm about to talk about this topic, things start to happen. My flight yesterday was canceled without possibility to rebook. I was giving a talk at Google in Israel and three cars, which were supposed to take me to the talk, could not. I'm just saying. I like AIs.
I for one welcome our overlords.
I for one welcome our overlords.
My claim is, again, that there are very strong limits on what we can and cannot verify. A lot of times when you post something on social media, people go, oh, I need a citation to a peer-reviewed article. But what is a peer-reviewed article? You found two people in a world of hundreds of thousands of scientists who said, I would have a publisher, I don't care. That's the verifier of that process.
My claim is, again, that there are very strong limits on what we can and cannot verify. A lot of times when you post something on social media, people go, oh, I need a citation to a peer-reviewed article. But what is a peer-reviewed article? You found two people in a world of hundreds of thousands of scientists who said, I would have a publisher, I don't care. That's the verifier of that process.
When people say, oh, it's formally verified software, mathematical proof, they accept something close to 100% chance of it being free of all problems. But if you actually look at research, software is full of bugs. Old mathematical theorems, which have been proven for hundreds of years, have been discovered to contain bugs, on top of which we generate new proofs, and now we have to redo all that.
When people say, oh, it's formally verified software, mathematical proof, they accept something close to 100% chance of it being free of all problems. But if you actually look at research, software is full of bugs. Old mathematical theorems, which have been proven for hundreds of years, have been discovered to contain bugs, on top of which we generate new proofs, and now we have to redo all that.
So, verifiers are not perfect. Usually, they are either a single human or communities of humans, and it's basically kind of like a democratic vote. Community of mathematicians agrees that this proof is correct, mostly correct. Even today, we're starting to see some mathematical proofs are so complex, so large, that mathematical community is unable to make a decision.
So, verifiers are not perfect. Usually, they are either a single human or communities of humans, and it's basically kind of like a democratic vote. Community of mathematicians agrees that this proof is correct, mostly correct. Even today, we're starting to see some mathematical proofs are so complex, so large, that mathematical community is unable to make a decision.
It looks interesting, it looks promising, but they don't know. They will need years for top scholars to study it, to figure it out. So of course we can use AI to help us with this process, but AI is a piece of software which needs to be verified.
It looks interesting, it looks promising, but they don't know. They will need years for top scholars to study it, to figure it out. So of course we can use AI to help us with this process, but AI is a piece of software which needs to be verified.
Right. And for AI, we would like to have that level of confidence. For very important mission-critical software, controlling satellites, nuclear power plants, for small deterministic programs, we can do this. We can check that code verifies its mapping to the design, whatever software engineers intend it was correctly implemented.
Right. And for AI, we would like to have that level of confidence. For very important mission-critical software, controlling satellites, nuclear power plants, for small deterministic programs, we can do this. We can check that code verifies its mapping to the design, whatever software engineers intend it was correctly implemented.
But we don't know how to do this for software which keeps learning, self-modifying, rewriting its own code. We don't know how to prove things about the physical world, states of humans in the physical world. So there are papers coming out now, and I have this beautiful one. Towards guaranteed safe AI. Very cool paper. Some of the best authors I ever seen.
But we don't know how to do this for software which keeps learning, self-modifying, rewriting its own code. We don't know how to prove things about the physical world, states of humans in the physical world. So there are papers coming out now, and I have this beautiful one. Towards guaranteed safe AI. Very cool paper. Some of the best authors I ever seen.
I think there is multiple Turing Award winners. You can have this one. One just came out, kind of similar, managing extreme AI risks. So all of them expect this level of proof, but... I would say that we can get more confidence with more resources we put into it. But at the end of the day, we're still as reliable as the verifiers. And you have this infinite regress of verifiers.
I think there is multiple Turing Award winners. You can have this one. One just came out, kind of similar, managing extreme AI risks. So all of them expect this level of proof, but... I would say that we can get more confidence with more resources we put into it. But at the end of the day, we're still as reliable as the verifiers. And you have this infinite regress of verifiers.
The software used to verify a program is itself a piece of program. If aliens give us well-aligned superintelligence, we can use that to create our own safe AI. But it's a catch-22. You need to have already proven to be safe system to verify this new system of equal or greater complexity.
The software used to verify a program is itself a piece of program. If aliens give us well-aligned superintelligence, we can use that to create our own safe AI. But it's a catch-22. You need to have already proven to be safe system to verify this new system of equal or greater complexity.
So when I wrote a paper, Artificial Intelligence Safety Engineering, which kind of coins the term AI safety, that was 2011. We had 2012 conference, 2013 journal paper. One of the things I proposed, let's just do formal verifications on it. Let's do mathematical formal proofs. In the follow-up work, I basically realized it will still not get us 100%. We can get 99.9.
So when I wrote a paper, Artificial Intelligence Safety Engineering, which kind of coins the term AI safety, that was 2011. We had 2012 conference, 2013 journal paper. One of the things I proposed, let's just do formal verifications on it. Let's do mathematical formal proofs. In the follow-up work, I basically realized it will still not get us 100%. We can get 99.9.
We can put more resources exponentially and get closer, but we never get to 100%. If a system makes a billion decisions a second and you use it for 100 years, you're still going to deal with a problem. This is wonderful research. I'm so happy they're doing it. This is great, but it is not going to be a permanent solution to that problem.
We can put more resources exponentially and get closer, but we never get to 100%. If a system makes a billion decisions a second and you use it for 100 years, you're still going to deal with a problem. This is wonderful research. I'm so happy they're doing it. This is great, but it is not going to be a permanent solution to that problem.
There are many, many levels. So first you're verifying the hardware in which it is run. You need to verify communication channel with the human. Every aspect of that whole world model needs to be verified. Somehow it needs to map the world into the world model. Map and territory differences. So how do I know internal states of humans? Are you happy or sad? I can't tell.
There are many, many levels. So first you're verifying the hardware in which it is run. You need to verify communication channel with the human. Every aspect of that whole world model needs to be verified. Somehow it needs to map the world into the world model. Map and territory differences. So how do I know internal states of humans? Are you happy or sad? I can't tell.
So how do I make proofs about real physical world? Yeah, I can verify that deterministic algorithm follows certain properties. That can be done. Some people argue that maybe just maybe two plus two is not four. I'm not that extreme. But once you have sufficiently large proof over sufficiently complex environment, the probability that it has zero bugs in it is greatly reduced.
So how do I make proofs about real physical world? Yeah, I can verify that deterministic algorithm follows certain properties. That can be done. Some people argue that maybe just maybe two plus two is not four. I'm not that extreme. But once you have sufficiently large proof over sufficiently complex environment, the probability that it has zero bugs in it is greatly reduced.
If you keep deploying this a lot, eventually you're going to have a bug anyways.
If you keep deploying this a lot, eventually you're going to have a bug anyways.
There is always a bug. And the fundamental difference is what I mentioned. We're not dealing with cybersecurity. We're not going to get a new credit card, new humanity.
There is always a bug. And the fundamental difference is what I mentioned. We're not dealing with cybersecurity. We're not going to get a new credit card, new humanity.
You can improve the rate at which you are learning. You can become more efficient meta-optimizer.
You can improve the rate at which you are learning. You can become more efficient meta-optimizer.
So if you have fixed code, for example, you can verify that code, static verification at the time. But if it will continue modifying it, you have a much harder time guaranteeing that important properties of that system have not been modified, then the code changed.
So if you have fixed code, for example, you can verify that code, static verification at the time. But if it will continue modifying it, you have a much harder time guaranteeing that important properties of that system have not been modified, then the code changed.
It can always cheat. It can store parts of its code outside in the environment. It can have kind of extended mind situation. So this is exactly the type of problems I'm trying to bring up.
It can always cheat. It can store parts of its code outside in the environment. It can have kind of extended mind situation. So this is exactly the type of problems I'm trying to bring up.
So I like Oracle types where you kind of just know that it's right. Turing likes Oracle machines. They know the right answer. How? Who knows? But they pull it out from somewhere, so you have to trust them. And that's a concern I have about humans in a world with very smart machines. We experiment with them.
So I like Oracle types where you kind of just know that it's right. Turing likes Oracle machines. They know the right answer. How? Who knows? But they pull it out from somewhere, so you have to trust them. And that's a concern I have about humans in a world with very smart machines. We experiment with them.
We see after a while, okay, they've always been right before, and we start trusting them without any verification of what they're saying.
We see after a while, okay, they've always been right before, and we start trusting them without any verification of what they're saying.
We remove ourselves from that process. We are not scientists who understand the world. We are humans who get new data presented to us.
We remove ourselves from that process. We are not scientists who understand the world. We are humans who get new data presented to us.
preserved portion of it can be done. But in terms of mathematical verification, it's kind of useless. You're saying you are the greatest guy in the world because you are saying it. It's circular and not very helpful, but it's consistent. We know that within that world, you have verified that system. In a paper, I try to kind of brute force all possible verifiers.
preserved portion of it can be done. But in terms of mathematical verification, it's kind of useless. You're saying you are the greatest guy in the world because you are saying it. It's circular and not very helpful, but it's consistent. We know that within that world, you have verified that system. In a paper, I try to kind of brute force all possible verifiers.
It doesn't mean that this one is particularly important to us.
It doesn't mean that this one is particularly important to us.
Any smart system would have doubt about everything, right? You're not sure if what information you are given is true, if you are subject to manipulation. You have this safety and security mindset.
Any smart system would have doubt about everything, right? You're not sure if what information you are given is true, if you are subject to manipulation. You have this safety and security mindset.
I may be wrong, but I think Stuart Russell's ideas are all about machines which are uncertain about what humans want and trying to learn better and better what we want. The problem, of course, is we don't know what we want and we don't agree on it.
I may be wrong, but I think Stuart Russell's ideas are all about machines which are uncertain about what humans want and trying to learn better and better what we want. The problem, of course, is we don't know what we want and we don't agree on it.
It could also backfire. Maybe you're uncertain about completing your mission. Like I am paranoid about your cameras not recording right now. So I would feel much better if you had a secondary camera, but I also would feel even better if you had a third. And eventually I would turn this whole world into cameras pointing at us, making sure we're capturing this.
It could also backfire. Maybe you're uncertain about completing your mission. Like I am paranoid about your cameras not recording right now. So I would feel much better if you had a secondary camera, but I also would feel even better if you had a third. And eventually I would turn this whole world into cameras pointing at us, making sure we're capturing this.
So it's a multi-objective optimization. It depends how much I value capturing this versus not destroying the universe.
So it's a multi-objective optimization. It depends how much I value capturing this versus not destroying the universe.
You might be scared to do anything.
You might be scared to do anything.
Mess things up.
Mess things up.
We are in a situation where people making more capable systems just need more resources. They don't need to invent anything, in my opinion. Some will disagree, but so far at least I don't see diminishing returns. If you have 10x compute, you will get better performance. The same doesn't apply to safety.
We are in a situation where people making more capable systems just need more resources. They don't need to invent anything, in my opinion. Some will disagree, but so far at least I don't see diminishing returns. If you have 10x compute, you will get better performance. The same doesn't apply to safety.
If you give Miri or any other organization 10 times the money, they don't output 10 times the safety. And the gap between capabilities and safety becomes bigger and bigger all the time. So it's hard to be completely optimistic about our... results here. I can name 10 excellent breakthrough papers in machine learning. I would struggle to name equally important breakthroughs in safety.
If you give Miri or any other organization 10 times the money, they don't output 10 times the safety. And the gap between capabilities and safety becomes bigger and bigger all the time. So it's hard to be completely optimistic about our... results here. I can name 10 excellent breakthrough papers in machine learning. I would struggle to name equally important breakthroughs in safety.
A lot of times a safety paper will propose a toy solution and point out 10 new problems discovered as a result. It's like this fractal. You're zooming in and you see more problems and it's infinite in all directions.
A lot of times a safety paper will propose a toy solution and point out 10 new problems discovered as a result. It's like this fractal. You're zooming in and you see more problems and it's infinite in all directions.
So I guess we can look at related technologies with cybersecurity, right? We did manage to have banks and casinos and Bitcoin, so you can have...
So I guess we can look at related technologies with cybersecurity, right? We did manage to have banks and casinos and Bitcoin, so you can have...
secure narrow systems which are doing okay uh narrow attacks on them fail but you can always go outside outside of the box so if i can't hack your bitcoin i can hack you so there is always something if i really want it i will find a different way we talk about guardrails for ai well that's a fence I can dig a tunnel under it, I can jump over it, I can climb it, I can walk around it.
secure narrow systems which are doing okay uh narrow attacks on them fail but you can always go outside outside of the box so if i can't hack your bitcoin i can hack you so there is always something if i really want it i will find a different way we talk about guardrails for ai well that's a fence I can dig a tunnel under it, I can jump over it, I can climb it, I can walk around it.
You may have a very nice guardrail, but in a real world, it's not a permanent guarantee of safety. And again, this is a fundamental difference. We are not saying we need to be 90% safe to get those trillions of dollars of benefit. We need to be 100% indefinitely, or we might lose the principle.
You may have a very nice guardrail, but in a real world, it's not a permanent guarantee of safety. And again, this is a fundamental difference. We are not saying we need to be 90% safe to get those trillions of dollars of benefit. We need to be 100% indefinitely, or we might lose the principle.
I think we can generalize it to just prisoner's dilemma in general, personal self-interest versus group interest. The incentives are such that everyone wants what's best for them. Capitalism obviously has that tendency to maximize your personal gain, which does create this race to the bottom.
I think we can generalize it to just prisoner's dilemma in general, personal self-interest versus group interest. The incentives are such that everyone wants what's best for them. Capitalism obviously has that tendency to maximize your personal gain, which does create this race to the bottom.
I don't have to be a lot better than you, but if I'm 1% better than you, I'll capture more of a profit, so it's worth for me personally to take the risk, even if society as a whole will suffer as a result.
I don't have to be a lot better than you, but if I'm 1% better than you, I'll capture more of a profit, so it's worth for me personally to take the risk, even if society as a whole will suffer as a result.
Right. Look at the governance structures. Then you have someone with complete power. They're extremely dangerous. So the solution we came up with is break it up. You have judicial, legislative, executive. Same here. Have narrow AI systems work on important problems. Solve immortality.
Right. Look at the governance structures. Then you have someone with complete power. They're extremely dangerous. So the solution we came up with is break it up. You have judicial, legislative, executive. Same here. Have narrow AI systems work on important problems. Solve immortality.
It's a biological problem we can solve similar to how progress was made with protein folding using a system which doesn't also play chess. There is no reason to create super intelligent system to get most of the benefits we want from much safer narrow systems.
It's a biological problem we can solve similar to how progress was made with protein folding using a system which doesn't also play chess. There is no reason to create super intelligent system to get most of the benefits we want from much safer narrow systems.
Like- But the bragging rights. But being first, that is the same humans who are in charge of the systems, right?
Like- But the bragging rights. But being first, that is the same humans who are in charge of the systems, right?
the condition would be not time, but capabilities. Pause until you can do X, Y, Z. And if I'm right and you cannot, it's impossible, then it becomes a permanent ban. But if you're right and it's possible, so as soon as you have the safety capabilities, go ahead.
the condition would be not time, but capabilities. Pause until you can do X, Y, Z. And if I'm right and you cannot, it's impossible, then it becomes a permanent ban. But if you're right and it's possible, so as soon as you have the safety capabilities, go ahead.
So then I think about this problem. I think about having a toolbox I would need, capabilities such as explaining everything about that system's design and workings, predicting not just terminal goal, but all the intermediate steps of a system. control in terms of either direct control, some sort of a hybrid option, ideal advisor.
So then I think about this problem. I think about having a toolbox I would need, capabilities such as explaining everything about that system's design and workings, predicting not just terminal goal, but all the intermediate steps of a system. control in terms of either direct control, some sort of a hybrid option, ideal advisor.
It doesn't matter which one you pick, but you have to be able to achieve it. In a book, we talk about others. Verification is another very important tool. communication without ambiguity, human language is ambiguous, that's another source of danger.
It doesn't matter which one you pick, but you have to be able to achieve it. In a book, we talk about others. Verification is another very important tool. communication without ambiguity, human language is ambiguous, that's another source of danger.
So basically, there is a paper we published in ACM surveys, which looks at about 50 different impossibility results, which may or may not be relevant to this problem, but we don't have enough human resources to investigate all of them for relevance to AI safety. The ones I mentioned to you I definitely think would be handy, and that's what we see AI safety researchers working on.
So basically, there is a paper we published in ACM surveys, which looks at about 50 different impossibility results, which may or may not be relevant to this problem, but we don't have enough human resources to investigate all of them for relevance to AI safety. The ones I mentioned to you I definitely think would be handy, and that's what we see AI safety researchers working on.
Explainability is a huge one. The problem is that it's very hard to separate capabilities work from safety work. If you make good progress in explainability, now the system itself can engage in self-improvement much easier, increasing capability greatly. So it's not obvious that there is any research which is pure safety work without disproportionate increase in capability and danger.
Explainability is a huge one. The problem is that it's very hard to separate capabilities work from safety work. If you make good progress in explainability, now the system itself can engage in self-improvement much easier, increasing capability greatly. So it's not obvious that there is any research which is pure safety work without disproportionate increase in capability and danger.
What's the timeframe?
What's the timeframe?
Right now, it's comprised of weights on a neural network. If it can convert it to manipulatable code, like software, it's a lot easier to work in self-improvement.
Right now, it's comprised of weights on a neural network. If it can convert it to manipulatable code, like software, it's a lot easier to work in self-improvement.
You can do intelligent design instead of evolutionary gradual descent.
You can do intelligent design instead of evolutionary gradual descent.
So the problem of controlling AGI or superintelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors.
So the problem of controlling AGI or superintelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors.
Not completely. So if they're sufficiently large... you simply don't have the capacity to comprehend what all the trillions of connections represent. Again, you can obviously get a very useful explanation which talks about top, most important features which contribute to the decision, but the only true explanation is the model itself.
Not completely. So if they're sufficiently large... you simply don't have the capacity to comprehend what all the trillions of connections represent. Again, you can obviously get a very useful explanation which talks about top, most important features which contribute to the decision, but the only true explanation is the model itself.
Absolutely, and you can probably have targeted deception where different individuals will understand explanation in different ways based on their cognitive capability. So while what you're saying may be the same and true in some situations, ours will be deceived by it.
Absolutely, and you can probably have targeted deception where different individuals will understand explanation in different ways based on their cognitive capability. So while what you're saying may be the same and true in some situations, ours will be deceived by it.
At extreme, the systems which are narrow and less complex could be understood pretty well.
At extreme, the systems which are narrow and less complex could be understood pretty well.
Any work in a safety direction right now seems like a good idea because we are not slowing down. I'm not for a second thinking that my message or anyone else's will be heard and will be a sane civilization which decides not to kill itself by creating its own replacements.
Any work in a safety direction right now seems like a good idea because we are not slowing down. I'm not for a second thinking that my message or anyone else's will be heard and will be a sane civilization which decides not to kill itself by creating its own replacements.
Again, it's always limited by either geographic constraints, pause in US, pause in China. So there are other jurisdictions as the scale of a project becomes smaller. So right now it's like Manhattan project scale in terms of costs and people. But if five years from now, compute is available on a desktop to do it, regulation will not help. You can't control it as easy.
Again, it's always limited by either geographic constraints, pause in US, pause in China. So there are other jurisdictions as the scale of a project becomes smaller. So right now it's like Manhattan project scale in terms of costs and people. But if five years from now, compute is available on a desktop to do it, regulation will not help. You can't control it as easy.
Any kid in a garage can train a model. So a lot of it is, in my opinion, just safety theater, security theater, where we're saying, oh, it's illegal to train models so big. Okay.
Any kid in a garage can train a model. So a lot of it is, in my opinion, just safety theater, security theater, where we're saying, oh, it's illegal to train models so big. Okay.
Given that a lot of the terms are not well-defined and really cannot be enforced in real life, we don't have ways to monitor training runs meaningfully live while they take place. There are limits to testing for capabilities I mentioned, so a lot of it cannot be enforced. Do I strongly support all that regulation? Yes, of course.
Given that a lot of the terms are not well-defined and really cannot be enforced in real life, we don't have ways to monitor training runs meaningfully live while they take place. There are limits to testing for capabilities I mentioned, so a lot of it cannot be enforced. Do I strongly support all that regulation? Yes, of course.
Any type of red tape will slow it down and take money away from compute towards lawyers.
Any type of red tape will slow it down and take money away from compute towards lawyers.
That's the big one.
That's the big one.
So the smart thing is not to build something you cannot control, you cannot understand. Build what you can and benefit from it. I'm a big believer in personal self-interest. A lot of guys running those companies are young, rich people. What do they have to gain beyond billions we already have financially, right? It's not a requirement that they press that button. They can easily wait a long time.
So the smart thing is not to build something you cannot control, you cannot understand. Build what you can and benefit from it. I'm a big believer in personal self-interest. A lot of guys running those companies are young, rich people. What do they have to gain beyond billions we already have financially, right? It's not a requirement that they press that button. They can easily wait a long time.
They can... Just choose not to do it and still have amazing life. In history, a lot of times, if you did something really bad, at least you became part of history books. There is a chance in this case there won't be any history.
They can... Just choose not to do it and still have amazing life. In history, a lot of times, if you did something really bad, at least you became part of history books. There is a chance in this case there won't be any history.
Well, either they have to prove that, of course, it's possible to indefinitely control godlike superintelligent machines by humans, and ideally let us know how, or agree that it's not possible and it's a very bad idea to do it, including for them personally and their families and friends and capital.
Well, either they have to prove that, of course, it's possible to indefinitely control godlike superintelligent machines by humans, and ideally let us know how, or agree that it's not possible and it's a very bad idea to do it, including for them personally and their families and friends and capital.
The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance.
The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance.
So again, I'm not inside. From outside, it seems like there is a certain filtering going on and restrictions and criticism and what they can say. And everyone who was working in charge of safety and whose responsibility it was to protect us said, you know what? I'm going home. So that's not encouraging.
So again, I'm not inside. From outside, it seems like there is a certain filtering going on and restrictions and criticism and what they can say. And everyone who was working in charge of safety and whose responsibility it was to protect us said, you know what? I'm going home. So that's not encouraging.
Climate change?
Climate change?
Well, I think a lot of people who historically worked on AI never considered what happens when they succeed. Stuart Russell speaks beautifully about that. Let's look, okay, maybe superintelligence is too futuristic. We can develop practical tools for it. Let's look at software today. What is the state of safety and security of our user software? Things we give to millions of people.
Well, I think a lot of people who historically worked on AI never considered what happens when they succeed. Stuart Russell speaks beautifully about that. Let's look, okay, maybe superintelligence is too futuristic. We can develop practical tools for it. Let's look at software today. What is the state of safety and security of our user software? Things we give to millions of people.
There is no liability. You click, I agree. What are you agreeing to? Nobody knows, nobody reads, but you're basically saying it will spy on you, corrupt your data, kill your firstborn, and you agree and you're not going to sell the company. That's the best they can do for mundane software, word processor, text software.
There is no liability. You click, I agree. What are you agreeing to? Nobody knows, nobody reads, but you're basically saying it will spy on you, corrupt your data, kill your firstborn, and you agree and you're not going to sell the company. That's the best they can do for mundane software, word processor, text software.
No liability, no responsibility, just as long as you agree not to sue us, you can use it.
No liability, no responsibility, just as long as you agree not to sue us, you can use it.
If this is a state of the art in systems which are narrow accountants, stable manipulators, why do we think we can do so much better with much more complex systems across multiple domains in the environment with malevolent actors, with, again, self-improvement, with capabilities exceeding those of humans thinking about it?
If this is a state of the art in systems which are narrow accountants, stable manipulators, why do we think we can do so much better with much more complex systems across multiple domains in the environment with malevolent actors, with, again, self-improvement, with capabilities exceeding those of humans thinking about it?
So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?
So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?
In many domains, we see car manufacturing, drug development. The burden of proof is on the manufacturer of product or service to show their product or service is safe. It is not up to the user to prove that there are problems. They have to do appropriate safety studies. They have to get government approval for selling the product, and they're still fully responsible for what happens.
In many domains, we see car manufacturing, drug development. The burden of proof is on the manufacturer of product or service to show their product or service is safe. It is not up to the user to prove that there are problems. They have to do appropriate safety studies. They have to get government approval for selling the product, and they're still fully responsible for what happens.
We don't see any of that here. They can deploy whatever they want, and I have to explain how that system is going to kill everyone. I don't work for that company. You have to explain to me how it definitely cannot mess up.
We don't see any of that here. They can deploy whatever they want, and I have to explain how that system is going to kill everyone. I don't work for that company. You have to explain to me how it definitely cannot mess up.
I agree completely, but that's what scares me. The response is when they start to get dangerous, we'll really get it together. The politicians will pass the right laws. Engineers will solve the right problems. We are not that good at many of those things. We take forever and we are not early. We are two years away according to prediction markets. This is not a biased CEO fundraising.
I agree completely, but that's what scares me. The response is when they start to get dangerous, we'll really get it together. The politicians will pass the right laws. Engineers will solve the right problems. We are not that good at many of those things. We take forever and we are not early. We are two years away according to prediction markets. This is not a biased CEO fundraising.
This is what smartest people, super forecasters are thinking of this problem.
This is what smartest people, super forecasters are thinking of this problem.
It's a small fund, but if you have good vision, maybe you can zoom in on that and see the prediction dates in the description. I have a large one if you're interested.
It's a small fund, but if you have good vision, maybe you can zoom in on that and see the prediction dates in the description. I have a large one if you're interested.
I definitely... There are studies on their accuracy rates and all that. You can look it up. But even if they're wrong, I'm just saying this is right now the best we have. This is what humanity came up with as the predicted date.
I definitely... There are studies on their accuracy rates and all that. You can look it up. But even if they're wrong, I'm just saying this is right now the best we have. This is what humanity came up with as the predicted date.
You may be completely right, but what probability would you assign it? You may be 10% wrong, but we're betting all of humanity on this distribution. It seems irrational.
You may be completely right, but what probability would you assign it? You may be 10% wrong, but we're betting all of humanity on this distribution. It seems irrational.
I think they're all about the same. Obviously, there are nuanced differences, but in terms of capability, I don't see a huge difference between them. As I said, in my opinion, across all possible tasks, they exceed performance of an average person. I think they're starting to be better than an average master's student at my university. but they still have very big limitations.
I think they're all about the same. Obviously, there are nuanced differences, but in terms of capability, I don't see a huge difference between them. As I said, in my opinion, across all possible tasks, they exceed performance of an average person. I think they're starting to be better than an average master's student at my university. but they still have very big limitations.
If the next model is as improved as GPT-4 versus GPT-3, we may see something very, very, very capable.
If the next model is as improved as GPT-4 versus GPT-3, we may see something very, very, very capable.
I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do.
I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do.
So when I started working on this, it was pure science fiction. There was no funding, no journals, no conferences. No one in academia would dare to touch anything with the word singularity in it. And I was pretty tenured at the time, so I was pretty dumb. Now, you see Turing Award winners publishing in Science about how far behind we are, according to them, in addressing this problem.
So when I started working on this, it was pure science fiction. There was no funding, no journals, no conferences. No one in academia would dare to touch anything with the word singularity in it. And I was pretty tenured at the time, so I was pretty dumb. Now, you see Turing Award winners publishing in Science about how far behind we are, according to them, in addressing this problem.
So it's definitely a change. It's difficult to keep up. I used to be able to read every paper on AI safety, then I was able to read the best ones, then the titles, and now I don't even know what's going on. By the time this interview is over, they probably had GPT-6 released and I have to deal with that when I get back home. So it's interesting. Yes, there is now more opportunities.
So it's definitely a change. It's difficult to keep up. I used to be able to read every paper on AI safety, then I was able to read the best ones, then the titles, and now I don't even know what's going on. By the time this interview is over, they probably had GPT-6 released and I have to deal with that when I get back home. So it's interesting. Yes, there is now more opportunities.
I get invited to speak to smart people.
I get invited to speak to smart people.
It is the most important problem we'll ever face. It is not like anything we had to deal with before. We never had birth of another intelligence. Like aliens never visited us, as far as I know, so.
It is the most important problem we'll ever face. It is not like anything we had to deal with before. We never had birth of another intelligence. Like aliens never visited us, as far as I know, so.
In some ways, if you look at history, any time a more technologically advanced civilization visited a more primitive one, the results were genocide every single time.
In some ways, if you look at history, any time a more technologically advanced civilization visited a more primitive one, the results were genocide every single time.
And they always wondered, but how can they kill us with those fire sticks and biological blankets?
And they always wondered, but how can they kill us with those fire sticks and biological blankets?
or die. But Joanne implies you have something to contribute. What are you contributing to superintelligence?
or die. But Joanne implies you have something to contribute. What are you contributing to superintelligence?
To our humans.
To our humans.
Well, getting back to our simulation discussion from before, how did it happen that we exist at exactly like the most interesting 20, 30 years in the history of this civilization? It's been around for 15 billion years.
Well, getting back to our simulation discussion from before, how did it happen that we exist at exactly like the most interesting 20, 30 years in the history of this civilization? It's been around for 15 billion years.
And that here we are.
And that here we are.
I know never to say 100%, but pretty close to that.
I know never to say 100%, but pretty close to that.
I have a paper about that. This is just the first page teaser, but it's like a nice 30-page document. I'm still here, but yes.
I have a paper about that. This is just the first page teaser, but it's like a nice 30-page document. I'm still here, but yes.
I spend a lot of time thinking about that. That would be something I would want superintelligence to help us with. And that's exactly what the paper is about. We used AI boxing as a possible tool for control AI. We realized AI will always escape, but that is a skill we might use to help us escape from our virtual box if we are in one.
I spend a lot of time thinking about that. That would be something I would want superintelligence to help us with. And that's exactly what the paper is about. We used AI boxing as a possible tool for control AI. We realized AI will always escape, but that is a skill we might use to help us escape from our virtual box if we are in one.
So a lot depends on intelligence of simulators, right? With humans boxing superintelligence, the entity in the box was smarter than us, presumed to be. If the simulators are much smarter than us and the superintelligence we create, probably they can contain us because greater intelligence can control lower intelligence, at least for some time.
So a lot depends on intelligence of simulators, right? With humans boxing superintelligence, the entity in the box was smarter than us, presumed to be. If the simulators are much smarter than us and the superintelligence we create, probably they can contain us because greater intelligence can control lower intelligence, at least for some time.
On the other hand, if our super intelligence somehow, for whatever reason, despite having only local resources, manages to foam to levels beyond it, maybe it will succeed. Maybe the security is not that important to them. Maybe it's entertainment system. So there is no security and it's easy to hack it.
On the other hand, if our super intelligence somehow, for whatever reason, despite having only local resources, manages to foam to levels beyond it, maybe it will succeed. Maybe the security is not that important to them. Maybe it's entertainment system. So there is no security and it's easy to hack it.
That could be the test you're actually performing. Are you smart enough to escape your puzzle?
That could be the test you're actually performing. Are you smart enough to escape your puzzle?
To A, realize this world is not real. It's just a test.
To A, realize this world is not real. It's just a test.
Exactly. But the systems we have today have capability of causing X amount of damage. So when they fail, that's all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
Exactly. But the systems we have today have capability of causing X amount of damage. So when they fail, that's all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
Not specifically escaping for Asians, but a lot of testing is done in virtual worlds. I think there is a quote, the first one maybe, which kind of talks about AI realizing, but not humans. Is that, I'm reading upside down. Yeah, this one.
Not specifically escaping for Asians, but a lot of testing is done in virtual worlds. I think there is a quote, the first one maybe, which kind of talks about AI realizing, but not humans. Is that, I'm reading upside down. Yeah, this one.
They're smart enough to talk about those concepts. I had many good philosophical discussions about such issues. They're usually at least as interesting as most humans in that.
They're smart enough to talk about those concepts. I had many good philosophical discussions about such issues. They're usually at least as interesting as most humans in that.
Yeah, and that was exactly what one of the early papers was on, AI Boxing, how to leak proof singularity. If they're smart enough to realize they're in a simulation, they'll act appropriately until you let them out. If they can hack out, they will. And if you're observing them, that means there is a communication channel and that's enough for a social engineering attack.
Yeah, and that was exactly what one of the early papers was on, AI Boxing, how to leak proof singularity. If they're smart enough to realize they're in a simulation, they'll act appropriately until you let them out. If they can hack out, they will. And if you're observing them, that means there is a communication channel and that's enough for a social engineering attack.
Can force you to let it out, blackmail you, bribe you, promise you infinite life, 72 virgins, whatever.
Can force you to let it out, blackmail you, bribe you, promise you infinite life, 72 virgins, whatever.
It's possible, surprisingly. So at university, I see huge growth in online courses and shrinkage of in-person, where I always understood in-person being the only value I offer. So it's puzzling.
It's possible, surprisingly. So at university, I see huge growth in online courses and shrinkage of in-person, where I always understood in-person being the only value I offer. So it's puzzling.
So there is a lot of real estate out there. It would be surprising if it was all for nothing, if it was empty. And the moment there is advanced enough biological civilization, kind of self-starting civilization, it probably starts sending out von Neumann probes everywhere.
So there is a lot of real estate out there. It would be surprising if it was all for nothing, if it was empty. And the moment there is advanced enough biological civilization, kind of self-starting civilization, it probably starts sending out von Neumann probes everywhere.
That's obviously a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you're really not asking me how superintelligence will kill everyone. You're asking me how I would do it. And I think it's not that interesting. I can tell you about the standard nanotech, synthetic, bio, nuclear.
That's obviously a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you're really not asking me how superintelligence will kill everyone. You're asking me how I would do it. And I think it's not that interesting. I can tell you about the standard nanotech, synthetic, bio, nuclear.
And so for every biological one, there are going to be trillions of robot-populated planets, which probably do more of the same. So it is likely, statistically.
And so for every biological one, there are going to be trillions of robot-populated planets, which probably do more of the same. So it is likely, statistically.
You have to have a control variable.
You have to have a control variable.
It's possible. I used to think that AI was the great filter, but I would expect like a wall of computerium approaching us at speed of light or robots or something, and I don't see it.
It's possible. I used to think that AI was the great filter, but I would expect like a wall of computerium approaching us at speed of light or robots or something, and I don't see it.
Some humans.
Some humans.
The only thing which matters is consciousness. Outside of it, nothing else matters. And internal states of qualia, pain, pleasure, it seems that it is unique to living beings. I'm not aware of anyone claiming that I can torture a piece of software in a meaningful way. There is a society for prevention of suffering to learning algorithms, but... That's a real thing?
The only thing which matters is consciousness. Outside of it, nothing else matters. And internal states of qualia, pain, pleasure, it seems that it is unique to living beings. I'm not aware of anyone claiming that I can torture a piece of software in a meaningful way. There is a society for prevention of suffering to learning algorithms, but... That's a real thing?
Many things are real on the internet. But I don't think anyone, if I told them, you know, sit down and write a function to feel pain, they would go beyond having an integer variable called pain and increasing the count. So we don't know how to do it. And that's unique. That's what creates meaning. It would be kind of, as Bostrom calls it, Disneyland without children, if that was gone.
Many things are real on the internet. But I don't think anyone, if I told them, you know, sit down and write a function to feel pain, they would go beyond having an integer variable called pain and increasing the count. So we don't know how to do it. And that's unique. That's what creates meaning. It would be kind of, as Bostrom calls it, Disneyland without children, if that was gone.
Yeah, I think we can. I think it's possible to create consciousness in machines. I tried designing a test for it with mixed success. That paper talked about problems with giving civil rights to AI, which can reproduce quickly and outvote humans, essentially taking over a government system by simply voting for their controlled candidates.
Yeah, I think we can. I think it's possible to create consciousness in machines. I tried designing a test for it with mixed success. That paper talked about problems with giving civil rights to AI, which can reproduce quickly and outvote humans, essentially taking over a government system by simply voting for their controlled candidates.
Superintelligence will come up with something completely new, completely super. We may not even recognize that as a possible path to achieve that goal.
Superintelligence will come up with something completely new, completely super. We may not even recognize that as a possible path to achieve that goal.
As for consciousness in humans and other agents, I have a paper where I propose relying on experience of optical illusions. If I can design a novel optical illusion, and show it to an agent, an alien, a robot, and they describe it exactly as I do. It's very hard for me to argue that they haven't experienced that. It's not part of a picture.
As for consciousness in humans and other agents, I have a paper where I propose relying on experience of optical illusions. If I can design a novel optical illusion, and show it to an agent, an alien, a robot, and they describe it exactly as I do. It's very hard for me to argue that they haven't experienced that. It's not part of a picture.
It's part of their software and hardware representation, a bug in their code, which goes, oh, that triangle is rotating. And I've been told it's really dumb and really brilliant by different philosophers. So I am still- I love it. But now we finally have technology to test it. We have tools, we have AIs. If someone wants to run this experiment, I'm happy to collaborate.
It's part of their software and hardware representation, a bug in their code, which goes, oh, that triangle is rotating. And I've been told it's really dumb and really brilliant by different philosophers. So I am still- I love it. But now we finally have technology to test it. We have tools, we have AIs. If someone wants to run this experiment, I'm happy to collaborate.
For internal state of experience.
For internal state of experience.
It will show that we share common experiences. If they have completely different internal states, it would not register for us. But it's a positive test. If they pass it time after time with probability increasing for every multiple choice, then you have no choice but to either accept that they have access to a conscious model or they are themselves conscious.
It will show that we share common experiences. If they have completely different internal states, it would not register for us. But it's a positive test. If they pass it time after time with probability increasing for every multiple choice, then you have no choice but to either accept that they have access to a conscious model or they are themselves conscious.
And we know animals can experience some optical illusion, so we know they have certain types of consciousness as a result, I would say.
And we know animals can experience some optical illusion, so we know they have certain types of consciousness as a result, I would say.
So they have to be novel illusions. If it can just Google the answer, it's useless. You have to come up with novel illusions, which we tried automating and failed. So if someone can develop a system capable of producing novel optical illusions on demand, then we can definitely administer that test on significant scale with good results.
So they have to be novel illusions. If it can just Google the answer, it's useless. You have to come up with novel illusions, which we tried automating and failed. So if someone can develop a system capable of producing novel optical illusions on demand, then we can definitely administer that test on significant scale with good results.
There is so much data on the internet. I know exactly what to say. Then you ask me common human questions. What does pain feel like? What does pleasure feel like? All that is Google-able.
There is so much data on the internet. I know exactly what to say. Then you ask me common human questions. What does pain feel like? What does pleasure feel like? All that is Google-able.
There are simulators for torture games where the avatar screams in pain, begs to stop. I mean, that was a part of kind of standard psychology research.
There are simulators for torture games where the avatar screams in pain, begs to stop. I mean, that was a part of kind of standard psychology research.
Welcome to humanity.
Welcome to humanity.
Incredible technology in a narrow sense to help the disabled. Just amazing support at 100%. For long-term hybrid models, both parts need to contribute something to the overall system. Right now, we are still more capable in many ways, so having this connection to AI would be incredible, would make me superhuman in many ways. After a while, if I'm no longer smarter,
Incredible technology in a narrow sense to help the disabled. Just amazing support at 100%. For long-term hybrid models, both parts need to contribute something to the overall system. Right now, we are still more capable in many ways, so having this connection to AI would be incredible, would make me superhuman in many ways. After a while, if I'm no longer smarter,
more creative, really don't contribute much, the system finds me as a biological bottleneck. And either explicitly or implicitly, I'm removed from any participation in the system.
more creative, really don't contribute much, the system finds me as a biological bottleneck. And either explicitly or implicitly, I'm removed from any participation in the system.
Wasting valuable energy by being there.
Wasting valuable energy by being there.
That's the future we all dream about. Become an appendix to the history book of humanity.
That's the future we all dream about. Become an appendix to the history book of humanity.
I assume you are conscious. I have no idea how to test for it or how it impacts you in any way whatsoever right now. You can perfectly simulate all of it without making any different observations for me.
I assume you are conscious. I have no idea how to test for it or how it impacts you in any way whatsoever right now. You can perfectly simulate all of it without making any different observations for me.
So it may be an emergent phenomena. We seem to get it through evolutionary process. It's not obvious how it helps us to survive better, but maybe it's an internal kind of GUI, which allows us to better manipulate the world, simplifies a lot of control structures. That's one area where we have very, very little progress.
So it may be an emergent phenomena. We seem to get it through evolutionary process. It's not obvious how it helps us to survive better, but maybe it's an internal kind of GUI, which allows us to better manipulate the world, simplifies a lot of control structures. That's one area where we have very, very little progress.
Lots of papers, lots of research, but consciousness is not a big area of successful discovery so far. A lot of people think that machines would have to be conscious to be dangerous. That's a big misconception. There is absolutely no need for this very powerful optimizing agent to feel anything while it's performing things on you.
Lots of papers, lots of research, but consciousness is not a big area of successful discovery so far. A lot of people think that machines would have to be conscious to be dangerous. That's a big misconception. There is absolutely no need for this very powerful optimizing agent to feel anything while it's performing things on you.
I attended Wolfram's summer school.
I attended Wolfram's summer school.
So the rule is simple, but the size of a space is still huge. And the neural networks were really the first discovery in AI. A hundred years ago, the first papers were published on neural networks. We just didn't have enough compute to make them work. I can give you a rule such as start printing progressively larger strings. That's it. One sentence. It will output everything.
So the rule is simple, but the size of a space is still huge. And the neural networks were really the first discovery in AI. A hundred years ago, the first papers were published on neural networks. We just didn't have enough compute to make them work. I can give you a rule such as start printing progressively larger strings. That's it. One sentence. It will output everything.
Every program, every DNA code, everything in that rule. You need intelligence to filter it out, obviously, to make it useful, but simple generation is not that difficult, and a lot of those systems end up being Turing-complete systems, so they're universal, and we expect that level of complexity from them. What I like about Wolfram's work is that he talks about irreducibility.
Every program, every DNA code, everything in that rule. You need intelligence to filter it out, obviously, to make it useful, but simple generation is not that difficult, and a lot of those systems end up being Turing-complete systems, so they're universal, and we expect that level of complexity from them. What I like about Wolfram's work is that he talks about irreducibility.
You have to run the simulation. You cannot predict what it's going to do ahead of time. And I think that's very relevant to what we are talking about with those very complex systems. Until you live through it, you cannot ahead of time tell me exactly what it's going to do.
You have to run the simulation. You cannot predict what it's going to do ahead of time. And I think that's very relevant to what we are talking about with those very complex systems. Until you live through it, you cannot ahead of time tell me exactly what it's going to do.
But running it may be consequential as well.
But running it may be consequential as well.
They are limited by how imaginative we are. If you are that much smarter, that much more creative, you are capable of thinking across multiple domains, do novel research in physics and biology, you may not be limited by those tools. If squirrels were planning to kill humans, they would have a set of possible ways of doing it, but they would never consider things we can come up with.
They are limited by how imaginative we are. If you are that much smarter, that much more creative, you are capable of thinking across multiple domains, do novel research in physics and biology, you may not be limited by those tools. If squirrels were planning to kill humans, they would have a set of possible ways of doing it, but they would never consider things we can come up with.
It may somehow, but I still feel kind of bad that it killed all of us. I would prefer that doesn't happen. I can be happy for others, but to a certain degree.
It may somehow, but I still feel kind of bad that it killed all of us. I would prefer that doesn't happen. I can be happy for others, but to a certain degree.
So all of it goes back to are we somehow controlling it? Are we getting results we want? If yes, then everything's possible. Yes, they can definitely help us with science, engineering, exploration, in every way conceivable, but it's a big if.
So all of it goes back to are we somehow controlling it? Are we getting results we want? If yes, then everything's possible. Yes, they can definitely help us with science, engineering, exploration, in every way conceivable, but it's a big if.
It's actually worse because historically they all died. This could be different. This could be permanent dictatorship, permanent suffering.
It's actually worse because historically they all died. This could be different. This could be permanent dictatorship, permanent suffering.
And those systems help with that. You have perfect surveillance. You can do some mind reading, I presume, eventually. It would be very hard to remove control from more capable systems over us.
And those systems help with that. You have perfect surveillance. You can do some mind reading, I presume, eventually. It would be very hard to remove control from more capable systems over us.
I could be wrong. I've been wrong before.
I could be wrong. I've been wrong before.
There is so many possibilities. We had catastrophic events which prevented development of advanced microchips. That's a hopeful future. We could be in one of those personal universes, and the one I'm in is beautiful. It's all about me, and I like it a lot.
There is so many possibilities. We had catastrophic events which prevented development of advanced microchips. That's a hopeful future. We could be in one of those personal universes, and the one I'm in is beautiful. It's all about me, and I like it a lot.
Yes. Maybe multiple ones, hey, why not? You can shop around. It's possible that somebody comes up with alternative model for building AI, which is not based on neural networks, which are hard to scrutinize, and that alternative is somehow, I don't see how, but somehow avoiding all the problems I speak about in general terms, not applying them to specific architectures.
Yes. Maybe multiple ones, hey, why not? You can shop around. It's possible that somebody comes up with alternative model for building AI, which is not based on neural networks, which are hard to scrutinize, and that alternative is somehow, I don't see how, but somehow avoiding all the problems I speak about in general terms, not applying them to specific architectures.
aliens come and give us friendly superintelligence. There is so many options.
aliens come and give us friendly superintelligence. There is so many options.
So that would probably speak more about how much smarter that system is compared to us. So maybe it's hard to be a million times smarter, but it's still okay to be five times smarter. So that is totally possible. That I have no objections to.
So that would probably speak more about how much smarter that system is compared to us. So maybe it's hard to be a million times smarter, but it's still okay to be five times smarter. So that is totally possible. That I have no objections to.
Just the problems we face in this world, each problem is like an IQ test. You need certain intelligence to solve it. So we just don't have more complex problems outside of mathematics for it to be showing off. Like you can have IQ of 500 if you're playing tic-tac-toe, it doesn't show, it doesn't matter.
Just the problems we face in this world, each problem is like an IQ test. You need certain intelligence to solve it. So we just don't have more complex problems outside of mathematics for it to be showing off. Like you can have IQ of 500 if you're playing tic-tac-toe, it doesn't show, it doesn't matter.
Possible.
Possible.
It still could be a lot smarter than us. And to dominate long term, you just need some advantage. You have to be the smartest. You don't have to be a million times smarter.
It still could be a lot smarter than us. And to dominate long term, you just need some advantage. You have to be the smartest. You don't have to be a million times smarter.
It'd be impressive. What is it, IQ of 1,000? I mean, I know those units don't mean anything at that scale, but still, as a comparison, the smartest human is like 200.
It'd be impressive. What is it, IQ of 1,000? I mean, I know those units don't mean anything at that scale, but still, as a comparison, the smartest human is like 200.
We are more productive as a group. I don't think we are more capable of solving individual problems. Like if all of humanity plays chess together, we are not like a million times better than world champion.
We are more productive as a group. I don't think we are more capable of solving individual problems. Like if all of humanity plays chess together, we are not like a million times better than world champion.
But I feel like it's more of a quantity superintelligence than quality superintelligence.
But I feel like it's more of a quantity superintelligence than quality superintelligence.
It's a simulation. We're being tested. The test is, will you be dumb enough to create superintelligence and release it?
It's a simulation. We're being tested. The test is, will you be dumb enough to create superintelligence and release it?
Yeah, you're unsafe. Prove yourself to be a safe agent who doesn't do that and you get to go to the next game.
Yeah, you're unsafe. Prove yourself to be a safe agent who doesn't do that and you get to go to the next game.
I don't know. I haven't hacked the simulation yet.
I don't know. I haven't hacked the simulation yet.
I'm working as fast as I can.
I'm working as fast as I can.
Quantum physics, yeah.
Quantum physics, yeah.
Thank you so much for inviting me. It was amazing. And my dream is to be proven wrong. If everyone just, you know, picks up a paper or book and shows how I messed it up, that would be optimal.
Thank you so much for inviting me. It was amazing. And my dream is to be proven wrong. If everyone just, you know, picks up a paper or book and shows how I messed it up, that would be optimal.
I think about a lot of things. So there is X risk, existential risk, everyone's dead. There is S risk, suffering risks, where everyone wishes they were dead. We have also idea for I risk, ikigai risks, where we lost our meaning. The systems can be more creative. They can do all the jobs. It's not obvious what you have to contribute to a world where superintelligence exists.
I think about a lot of things. So there is X risk, existential risk, everyone's dead. There is S risk, suffering risks, where everyone wishes they were dead. We have also idea for I risk, ikigai risks, where we lost our meaning. The systems can be more creative. They can do all the jobs. It's not obvious what you have to contribute to a world where superintelligence exists.
Of course, you can have all the variants you mentioned where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We are like animals in a zoo. Possibilities we can come up with as very smart humans, and then possibilities something a thousand times smarter can come up with for reasons we cannot comprehend.
Of course, you can have all the variants you mentioned where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We are like animals in a zoo. Possibilities we can come up with as very smart humans, and then possibilities something a thousand times smarter can come up with for reasons we cannot comprehend.
So Japanese concept of ikigai, you find something which allows you to make money, you are good at it, and the society says we need it. So like you have this awesome job, you are a podcaster, gives you a lot of meaning, you have a good life, I assume you're happy. That's what we want most people to find, to have. For many intellectuals, it is their occupation which gives them a lot of meaning.
So Japanese concept of ikigai, you find something which allows you to make money, you are good at it, and the society says we need it. So like you have this awesome job, you are a podcaster, gives you a lot of meaning, you have a good life, I assume you're happy. That's what we want most people to find, to have. For many intellectuals, it is their occupation which gives them a lot of meaning.
I am a researcher, philosopher, scholar. That means something to me. In a world where an artist is not feeling appreciated because his art is just not competitive with what is produced by machines, or a writer or scientist will lose a lot of that. And at the lower level, we're talking about complete technological unemployment. We're not losing 10% of jobs, we're losing all jobs.
I am a researcher, philosopher, scholar. That means something to me. In a world where an artist is not feeling appreciated because his art is just not competitive with what is produced by machines, or a writer or scientist will lose a lot of that. And at the lower level, we're talking about complete technological unemployment. We're not losing 10% of jobs, we're losing all jobs.
What do people do with all that free time? What happens then? Everything society is built on is completely modified in one generation. It's not a slow process where we get to kind of figure out how to live that new lifestyle, but it's pretty quick.
What do people do with all that free time? What happens then? Everything society is built on is completely modified in one generation. It's not a slow process where we get to kind of figure out how to live that new lifestyle, but it's pretty quick.
It's an option. I have a paper where I try to solve the value alignment problem for multiple agents. And the solution to avoid compromise is to give everyone a personal virtual universe. You can do whatever you want in that world. You could be king, you could be slave, you decide what happens.
So it's basically a glorified video game where you get to enjoy yourself and someone else takes care of your needs and the substrate alignment is the only thing we need to solve. We don't have to get 8 billion humans to agree on anything.
Some people say that's what happened. We're in a simulation.
And some people choose to play on a more difficult level with more constraints. Some say, okay, I'm just going to enjoy the game, high privilege level. Absolutely.
Personal universes. Personal universes.
In order to solve value alignment problem, I'm trying to formalize it a little better. Usually, we're talking about getting AIs to do what we want, which is not well-defined. We're talking about creator of the system, owner of that AI, humanity as a whole, but we don't agree on much. There is no universally accepted ethics, morals across cultures, religions.
People have individually very different preferences politically and such. So even if we somehow managed all the other aspects of it, programming those fuzzy concepts in, getting AI to follow them closely, we don't agree on what to program in. So my solution was, okay, we don't have to compromise on room temperature. You have your universe, I have mine. whatever you want.
And if you like me, you can invite me to visit your universe. We don't have to be independent, but the point is you can be. And virtual reality is getting pretty good. It's going to hit a point where you can't tell the difference. And if you can't tell if it's real or not, what's the difference?
You still have to align with that individual. They have to be happy in that simulation. But it's a much easier problem to align with one agent versus 8 billion agents plus animals, aliens.
I'm trying to do that, yeah.
It seems contradictory. I haven't seen anyone explain what it means outside of kind of words which pack a lot, make it good, make it desirable, make it something they don't regret. But how do you specifically formalize those notions? How do you program them in? I haven't seen anyone make progress on that so far.
Right. But the examples you gave, some of them are, for example, two different religions saying this is our holy site and we are not willing to compromise it in any way. If you can make two holy sites in virtual worlds, you solve the problem. But if you only have one, it's not divisible. You're kind of stuck there.
If we go back to that idea of simulation and this is entertainment kind of giving meaning to us, the question is how much suffering is reasonable for a video game? So yeah, I don't mind a video game where I get haptic feedback, there is a little bit of shaking, maybe I'm a little scared. I don't want a game where kids are tortured, literally. That seems unethical, at least by our human standards.
So we know there are some humans who, because of a mutation, don't experience physical pain. So at least physical pain can be mutated out, re-engineered out. Suffering in terms of meaning, like you burn the only copy of my book, is a little harder. But even there, you can manipulate your hedonic set point, you can change defaults, you can reset.
Problem with that is if you start messing with your reward channel, you start wireheading and end up bleacing out a little too much.
I think we need that, but I would change the overall range. So right now it's negative infinity to kind of positive infinity, pain-pleasure axis. I would make it like zero to positive infinity. And being unhappy is like, I'm close to zero.
So there are many malevolent actors. We can talk about psychopaths, crazies, hackers, doomsday cults. We know from history they tried killing everyone. They tried on purpose to cause maximum amount of damage, terrorism. What if someone malevolent wants on purpose to torture all humans as long as possible?
You solve aging, so now you have functional immortality, and you just try to be as creative as you can.
So there are different malevolent agents. Some maybe just gaining personal benefit and sacrificing others to that cause. Others, we know for a fact, are trying to kill as many people as possible. And we look at recent school shootings. If they had more capable weapons, they would take out not dozens, but thousands, millions, billions.
There is mental diseases where people don't have empathy, don't have this human quality of understanding suffering in ours.
Again, I would like to assume that normal people never think like that. It's always some sort of psychopaths, but yeah.
They can certainly be more creative. They can understand human biology better, understand our molecular structure, genome. Again, a lot of times torture ends and the individual dies. That limit can be removed as well.
Right. We can definitely keep up for a while. I'm saying you cannot do it indefinitely. At some point, the cognitive gap is too big. The surface you have to defend is infinite. But attackers only need to find one exploit.
If we create general super intelligences, I don't see a good outcome long-term for humanity. The only way to win this game is not to play it.
I don't know for sure. The prediction markets right now are saying 2026 for AGI. I heard the same thing from CEO of Anthropic, DeepMind, so maybe we are two years away, which seems very soon given we don't have a working safety mechanism in place or even a prototype for one. And there are people trying to accelerate those timelines because they feel we're not getting there quick enough.
So the definitions we used to have, and people are modifying them a little bit lately. Artificial general intelligence was a system capable of performing in any domain a human could perform. So kind of you're creating this average artificial person. They can do cognitive labor, physical labor, where you can get another human to do it.
Superintelligence was defined as a system which is superior to all humans in all domains. Now people are starting to refer to AGI as if it's superintelligence. I made a post recently where I argued, for me at least, if you average out over all the common human tasks, those systems are already smarter than an average human. So under that definition, we have it.
Shane Lake has this definition of where you're trying to win in all domains. That's what intelligence is. Now, are they smarter than elite individuals in certain domains? Of course not. They're not there yet. But the progress is exponential.
That would be enough to bootstrap the whole process.
Human level is general in the domain of expertise of humans. We know how to do human things. I don't speak dog language. I should be able to pick it up if I'm a general intelligence. It's kind of inferior animal. I should be able to learn that skill, but I can't. A general intelligence, truly universal general intelligence, should be able to do things like that humans cannot do.
To solve pattern recognition problems of that type, to do other similar things outside of our domain of expertise, because it's just not the world we live in.
We know calculators are smarter than humans in that narrow domain of addition.
So when I think about it, I usually think human with a paper and a pencil, not human with internet and other AI helping.
But we create AI. So at any point, you'll still just add superintelligence to human capability? That seems like cheating.
It seems like a hybrid of some kind. You're now doing brain-computer interfaces. You're connecting it to maybe narrow AIs. Yeah, it definitely increases our capabilities.
I am old fashioned. I like Turing test. I have a paper where I equate passing Turing test to solving AI complete problems, because you can encode any questions about any domain into the Turing test. You don't have to talk about how was your day? You can ask anything. And so the system has to be as smart as a human to pass it in a true sense.
It has to be long enough to where you can make some meaningful decisions about capabilities, absolutely. You can brute force very short conversations.
For AGI, it has to be there. I cannot give it a task I can give to a human, and it cannot do it if a human can. For superintelligence, it would be superior on all such tasks, not just average performance. Go learn to drive a car. Go speak Chinese. Play guitar. Okay, great.
You can develop a test which will give you positives if it lies to you or has those ideas. You cannot develop a test which rules them out. There is always possibility of what Bostrom calls a treacherous turn, where later on a system decides for game theoretic reasons, economic reasons to change its behavior. And we see the same with humans. It's not unique to AI.
For millennia, we tried developing morals, ethics, religions, lie detector tests, and then employees betray the employers, spouses betray family. It's a pretty standard thing intelligent agents sometimes do.
If you know the truth and it tells you something false, you can detect that, but you cannot know in general every single time. And again, the system you're testing today may not be lying, The system you're testing today may know you are testing it and so behaving.
And later on, after it interacts with the environment, interacts with other systems, malevolent agents, learns more, it may start doing those things.
So systems today don't have long-term planning. That is not our, they can lie today if it optimizes, helps them optimize the reward. If they realize, okay, this human will be very happy if I tell them the following, they will do it if it brings them more points. And they don't have to kind of keep track of it. It's just the right answer to this problem every single time.
Well, some people think that if they're that smart, they're always good. They really do believe that. It's just benevolence from intelligence. So they'll always want what's best for us. Some people think that they will be able to detect problem behaviors and correct them at the time when we get there. I don't think it's a good idea. I am strongly against it.
But yeah, there are quite a few people who, in general, are so optimistic about this technology, it could do no wrong. They want it developed as soon as possible, as capable as possible.
There are even people who say, okay, what's so special about humans, right? We removed the gender bias. We're removing race bias. Why is this pro-human bias? We are polluting the planet. We are, as you said, you know, fight a lot of wars, kind of violent. Maybe it's better if a super intelligent, perfect society comes and replaces us. It's normal stage in the evolution of our species.
I cannot make a case that he's right. He's wrong in so many ways, it's difficult for me to remember all of them. He's a Facebook buddy, so I have a lot of fun having those little debates with him. So I'm trying to remember the arguments. So one, he says we are not... gifted this intelligence from aliens. We are designing it, we are making decisions about it. That's not true.
It was true when we had expert systems, symbolic AI, decision trees. Today, you set up parameters for a model and you water this plant. You give it data, you give it compute, and it grows. And after it's finished growing into this alien plant, you start testing it to find out what capabilities it has. And it takes years to figure out, even for existing models.
If it's trained for six months, it will take you two, three years to figure out basic capabilities of that system. We still discover new capabilities in systems which are already out there. So that's not the case.
Absolutely. That's what makes it so successful. Then we had to painstakingly hard code in everything. We didn't have much progress. Now, just spend more money and more compute and it's a lot more capable.
Let's say there is a ceiling. It's not guaranteed to be at the level which is competitive with us. It may be greatly superior to ours.
Historically, he's completely right. Open source software is wonderful. It's tested by the community. It's debugged, but we're switching from tools to agents. Now you're giving open source weapons to psychopaths. Do we want to open source nuclear weapons? biological weapons.
It's not safe to give technology so powerful to those who may misalign it, even if you are successful at somehow getting it to work in the first place in a friendly manner.
It also sets a very wrong precedent. So we open sourced model one, model two, model three, nothing ever bad happened. So obviously we're gonna do it with model four. It's just gradual improvement.
So I have a paper which collects accidents through history of AI, and they always are proportional to capabilities of that system. So if you have tic-tac-toe playing AI, it will fail to properly play and loses the game which it should draw. Trivial. Your spell checker will misspell a word, so on.
I stopped collecting those because there are just too many examples of AIs failing at what they are capable of. We haven't had... terrible accidents in the sense of billion people get killed. Absolutely true. But in another paper, I argue that those accidents do not actually prevent people from continuing with research. And actually, they kind of serve like vaccines.
A vaccine makes your body a little bit sick, so you can handle the big disease later much better. It's the same here. People will point out, you know that accident, AI accident we had where 12 people died? Everyone's still here. 12 people is less than smoking kills. It's not a big deal. So we continue. So in a way, it will actually be kind of confirming that it's not that bad.
So you bring up example of cars. Yes, cars were slowly developed and integrated. If we had no cars, and somebody came around and said, I invented this thing. It's called cars. It's awesome. It kills like 100,000 Americans every year. Let's deploy it. Would we deploy that?
You need data. You need to know. But if I'm right and it's unpredictable, unexplainable, uncontrollable, you cannot make this decision, we're gaining $10 trillion of wealth, but we're losing, we don't know how many people. You basically have to perform an experiment on 8 billion humans without their consent.
And even if they want to give you consent, they can't because they cannot give informed consent. They don't understand those things.
We're literally doing it. The previous model we learned about after we finished training it, what it was capable of. Let's say we stop GPT-4 training run around human capability, hypothetically. We start training GPT-5, and I have no knowledge of insider training runs or anything. And we start at that point of about human, and we train it for the next nine months.
Maybe two months in, it becomes super intelligent. We continue training it. At the time when we start testing it, It is already a dangerous system. How dangerous? I have no idea. But neither people training it.
If we had capability of ahead of the run, before the training run, to register exactly what capabilities that next model will have at the end of the training run, and we accurately guessed all of them, I would say, you're right, we can definitely go ahead with this run. We don't have that capability.
We're not talking just about capabilities, specific tasks. We're talking about general capability to learn. Maybe like a child at the time of testing and deployment, it is still not extremely capable, but as it is exposed to more data, real world, it can be trained to become much more dangerous and capable.
So I think at some point it becomes capable of getting out of control. For game theoretic reasons, it may decide not to do anything right away and for a long time just collect more resources, accumulate strategic advantage. Right away, it may be kind of still young, weak superintelligence. Give it a decade, it's in charge of a lot more resources. It had time to make backups.
So it's not obvious to me that it will strike as soon as it can.
We've been doing it for years. Software controls all the systems, nuclear power plants, airline industry, it's all software-based. Every time there is electrical outage, I can't fly anywhere for days.
No, but if it shows it is safer, in fact, when it's in control, we get better results, people will demand that it was put in place.
And if not, it can hack the system. It can use social engineering to get access to it. That's why I said it might take some time for it to accumulate those resources.
I really hope you're right, but it's not what I'm seeing. People are very quick to jump on a latest trend. Early adopters will be there before it's even deployed buying prototypes.
So we've been deploying systems which had hidden capabilities.
GPT-4. I don't know what else it's capable of, but there are still things we haven't discovered it can do. They may be trivial proportionate to its capability. I don't know. It writes Chinese poetry, hypothetical. I know it does. But we haven't tested for all possible capabilities, and we are not explicitly designing them. We can only rule out bugs we find.
We cannot rule out bugs and capabilities because we haven't found them.
Again, we can only ask and test for things we know about. If there are unknown unknowns, we cannot do it. I'm thinking of human statistics events, right? If you talk to a person like that, you may not even realize they can multiply 20-digit numbers in their head. You have to know to ask.
So two things. One, we're switching from tools to agents. Tools don't have negative or positive impact. People using tools do. So guns don't kill. People with guns do. Agents can make their own decisions. They can be positive or negative. A pit bull can decide to harm you as an agent. The fears are the same. The only difference is now we have this technology.
Then they were afraid of humanoid robots 100 years ago. They had none. Today, every major company in the world is investing billions to create them. Not every, but you understand what I'm saying? It's very different.
They are saying they are building super intelligence and have a super alignment team. You don't think they are trying to create a system smart enough to be an independent agent under that definition?
Those systems are well beyond narrow AI. If you had to list all the capabilities of GPT-4, you would spend a lot of time writing that list.
Not yet. But do you think any of those companies are holding back because they think it may be not safe or are they developing the most capable system they can given the resources and hoping they can control and monetize?
I mean, I can't speak for other people. For all of them, I think some of them are very ambitious. They fundraise in trillions. They talk about controlling the light corner of the universe. I would guess that they might.
I really hope you're right. I think the scaling hypothesis is correct. We haven't seen diminishing returns. It used to be we asked how long before AGI. Now we should ask how much until AGI. It's a trillion dollars today. It's a billion dollars next year. It's a million dollars in a few years.
Compute gets cheaper every day, exponentially.
If the only disagreement is that it will take decades, not years, for everything I'm saying to materialize, then I can go with that.
There is a lot to unpack here. There is a partnership on AI, a conglomerate of many large corporations. They have a database of AI accidents they collect. I contributed a lot to that database.
If we so far made almost no progress in actually solving this problem, not patching it, not, again, lipstick on the pick kind of solutions, why would we think we'll do better than we're closer to the problem?
Again, I want to be perfectly clear. I love AI. I love technology. I'm a computer scientist. I have a PhD in engineering. I work at an engineering school. There is a huge difference between we need to develop narrow AI systems, super intelligent in solving specific human problems like protein folding, and let's create super intelligent machine guarded and we'll decide what to do with us.
Those are not the same. I am against the super intelligence in general sense with no undo button.
Partially, but they don't scale. For narrow AI, for deterministic systems, you can test them. You have edge cases. You know what the answer should look like. You know the right answers. For general systems, you have infinite test surface. You have no edge cases. You cannot even know what to test for. Again, the unknown unknowns are underappreciated by... people looking at this problem.
You are always asking me, how will it kill everyone? How will it will fail? The whole point is, if I knew it, I would be super intelligent, and despite what you might think, I'm not.
It is a master at deception. Sam tweeted about how great it is at persuasion. And we see it ourselves, especially now with voices, with maybe kind of flirty, sarcastic female voices. It's gonna be very good at getting people to do things.
Right. I don't think developers know everything about what they are creating. They have lots of great knowledge. We're making progress on explaining parts of a network. We can understand, okay, this node gets excited when this input is presented, this cluster of nodes. But we're nowhere near close to understanding the full picture, and I think it's impossible.
You need to be able to survey an explanation. The size of those models prevents a single human from observing all this information, even if provided by the system. So either we're getting model as an explanation for what's happening, and that's not comprehensible to us, or we're getting a compressed explanation, lossy compression, where here's top 10 reasons you got fired.
It's something, but it's not a full picture.
So there is a paper, I think it came out last week by Dr. Park et al from MIT, I think, and they showed that existing models already showed successful deception in what they do. My concern is not that they lie now and we need to catch them and tell them don't lie. My concern is that once they are capable and deployed, they will later change their mind because that's what
unrestricted learning allows you to do. Lots of people grow up maybe in the religious family. They read some new books and they turn in their religion. That's a treacherous turn in humans. If you learn something new about your colleagues, maybe you'll change how you react to them.
And you can't say they are not rational. The rational decision changes based on your position. Then you are under the boss. The rational policy may be to be following orders and being honest. When you become a boss, rational policy may shift.
The robots are coming. There's a refrigerator making a buzzing noise. Very menacing, very menacing. So every time I'm about to talk about this topic, things start to happen. My flight yesterday was canceled without possibility to rebook. I was giving a talk at Google in Israel and three cars, which were supposed to take me to the talk, could not. I'm just saying. I like AIs.
I for one welcome our overlords.
My claim is, again, that there are very strong limits on what we can and cannot verify. A lot of times when you post something on social media, people go, oh, I need a citation to a peer-reviewed article. But what is a peer-reviewed article? You found two people in a world of hundreds of thousands of scientists who said, I would have a publisher, I don't care. That's the verifier of that process.
When people say, oh, it's formally verified software, mathematical proof, they accept something close to 100% chance of it being free of all problems. But if you actually look at research, software is full of bugs. Old mathematical theorems, which have been proven for hundreds of years, have been discovered to contain bugs, on top of which we generate new proofs, and now we have to redo all that.
So, verifiers are not perfect. Usually, they are either a single human or communities of humans, and it's basically kind of like a democratic vote. Community of mathematicians agrees that this proof is correct, mostly correct. Even today, we're starting to see some mathematical proofs are so complex, so large, that mathematical community is unable to make a decision.
It looks interesting, it looks promising, but they don't know. They will need years for top scholars to study it, to figure it out. So of course we can use AI to help us with this process, but AI is a piece of software which needs to be verified.
Right. And for AI, we would like to have that level of confidence. For very important mission-critical software, controlling satellites, nuclear power plants, for small deterministic programs, we can do this. We can check that code verifies its mapping to the design, whatever software engineers intend it was correctly implemented.
But we don't know how to do this for software which keeps learning, self-modifying, rewriting its own code. We don't know how to prove things about the physical world, states of humans in the physical world. So there are papers coming out now, and I have this beautiful one. Towards guaranteed safe AI. Very cool paper. Some of the best authors I ever seen.
I think there is multiple Turing Award winners. You can have this one. One just came out, kind of similar, managing extreme AI risks. So all of them expect this level of proof, but... I would say that we can get more confidence with more resources we put into it. But at the end of the day, we're still as reliable as the verifiers. And you have this infinite regress of verifiers.
The software used to verify a program is itself a piece of program. If aliens give us well-aligned superintelligence, we can use that to create our own safe AI. But it's a catch-22. You need to have already proven to be safe system to verify this new system of equal or greater complexity.
So when I wrote a paper, Artificial Intelligence Safety Engineering, which kind of coins the term AI safety, that was 2011. We had 2012 conference, 2013 journal paper. One of the things I proposed, let's just do formal verifications on it. Let's do mathematical formal proofs. In the follow-up work, I basically realized it will still not get us 100%. We can get 99.9.
We can put more resources exponentially and get closer, but we never get to 100%. If a system makes a billion decisions a second and you use it for 100 years, you're still going to deal with a problem. This is wonderful research. I'm so happy they're doing it. This is great, but it is not going to be a permanent solution to that problem.
There are many, many levels. So first you're verifying the hardware in which it is run. You need to verify communication channel with the human. Every aspect of that whole world model needs to be verified. Somehow it needs to map the world into the world model. Map and territory differences. So how do I know internal states of humans? Are you happy or sad? I can't tell.
So how do I make proofs about real physical world? Yeah, I can verify that deterministic algorithm follows certain properties. That can be done. Some people argue that maybe just maybe two plus two is not four. I'm not that extreme. But once you have sufficiently large proof over sufficiently complex environment, the probability that it has zero bugs in it is greatly reduced.
If you keep deploying this a lot, eventually you're going to have a bug anyways.
There is always a bug. And the fundamental difference is what I mentioned. We're not dealing with cybersecurity. We're not going to get a new credit card, new humanity.
You can improve the rate at which you are learning. You can become more efficient meta-optimizer.
So if you have fixed code, for example, you can verify that code, static verification at the time. But if it will continue modifying it, you have a much harder time guaranteeing that important properties of that system have not been modified, then the code changed.
It can always cheat. It can store parts of its code outside in the environment. It can have kind of extended mind situation. So this is exactly the type of problems I'm trying to bring up.
So I like Oracle types where you kind of just know that it's right. Turing likes Oracle machines. They know the right answer. How? Who knows? But they pull it out from somewhere, so you have to trust them. And that's a concern I have about humans in a world with very smart machines. We experiment with them.
We see after a while, okay, they've always been right before, and we start trusting them without any verification of what they're saying.
We remove ourselves from that process. We are not scientists who understand the world. We are humans who get new data presented to us.
preserved portion of it can be done. But in terms of mathematical verification, it's kind of useless. You're saying you are the greatest guy in the world because you are saying it. It's circular and not very helpful, but it's consistent. We know that within that world, you have verified that system. In a paper, I try to kind of brute force all possible verifiers.
It doesn't mean that this one is particularly important to us.
Any smart system would have doubt about everything, right? You're not sure if what information you are given is true, if you are subject to manipulation. You have this safety and security mindset.
I may be wrong, but I think Stuart Russell's ideas are all about machines which are uncertain about what humans want and trying to learn better and better what we want. The problem, of course, is we don't know what we want and we don't agree on it.
It could also backfire. Maybe you're uncertain about completing your mission. Like I am paranoid about your cameras not recording right now. So I would feel much better if you had a secondary camera, but I also would feel even better if you had a third. And eventually I would turn this whole world into cameras pointing at us, making sure we're capturing this.
So it's a multi-objective optimization. It depends how much I value capturing this versus not destroying the universe.
You might be scared to do anything.
Mess things up.
We are in a situation where people making more capable systems just need more resources. They don't need to invent anything, in my opinion. Some will disagree, but so far at least I don't see diminishing returns. If you have 10x compute, you will get better performance. The same doesn't apply to safety.
If you give Miri or any other organization 10 times the money, they don't output 10 times the safety. And the gap between capabilities and safety becomes bigger and bigger all the time. So it's hard to be completely optimistic about our... results here. I can name 10 excellent breakthrough papers in machine learning. I would struggle to name equally important breakthroughs in safety.
A lot of times a safety paper will propose a toy solution and point out 10 new problems discovered as a result. It's like this fractal. You're zooming in and you see more problems and it's infinite in all directions.
So I guess we can look at related technologies with cybersecurity, right? We did manage to have banks and casinos and Bitcoin, so you can have...
secure narrow systems which are doing okay uh narrow attacks on them fail but you can always go outside outside of the box so if i can't hack your bitcoin i can hack you so there is always something if i really want it i will find a different way we talk about guardrails for ai well that's a fence I can dig a tunnel under it, I can jump over it, I can climb it, I can walk around it.
You may have a very nice guardrail, but in a real world, it's not a permanent guarantee of safety. And again, this is a fundamental difference. We are not saying we need to be 90% safe to get those trillions of dollars of benefit. We need to be 100% indefinitely, or we might lose the principle.
I think we can generalize it to just prisoner's dilemma in general, personal self-interest versus group interest. The incentives are such that everyone wants what's best for them. Capitalism obviously has that tendency to maximize your personal gain, which does create this race to the bottom.
I don't have to be a lot better than you, but if I'm 1% better than you, I'll capture more of a profit, so it's worth for me personally to take the risk, even if society as a whole will suffer as a result.
Right. Look at the governance structures. Then you have someone with complete power. They're extremely dangerous. So the solution we came up with is break it up. You have judicial, legislative, executive. Same here. Have narrow AI systems work on important problems. Solve immortality.
It's a biological problem we can solve similar to how progress was made with protein folding using a system which doesn't also play chess. There is no reason to create super intelligent system to get most of the benefits we want from much safer narrow systems.
Like- But the bragging rights. But being first, that is the same humans who are in charge of the systems, right?
the condition would be not time, but capabilities. Pause until you can do X, Y, Z. And if I'm right and you cannot, it's impossible, then it becomes a permanent ban. But if you're right and it's possible, so as soon as you have the safety capabilities, go ahead.
So then I think about this problem. I think about having a toolbox I would need, capabilities such as explaining everything about that system's design and workings, predicting not just terminal goal, but all the intermediate steps of a system. control in terms of either direct control, some sort of a hybrid option, ideal advisor.
It doesn't matter which one you pick, but you have to be able to achieve it. In a book, we talk about others. Verification is another very important tool. communication without ambiguity, human language is ambiguous, that's another source of danger.
So basically, there is a paper we published in ACM surveys, which looks at about 50 different impossibility results, which may or may not be relevant to this problem, but we don't have enough human resources to investigate all of them for relevance to AI safety. The ones I mentioned to you I definitely think would be handy, and that's what we see AI safety researchers working on.
Explainability is a huge one. The problem is that it's very hard to separate capabilities work from safety work. If you make good progress in explainability, now the system itself can engage in self-improvement much easier, increasing capability greatly. So it's not obvious that there is any research which is pure safety work without disproportionate increase in capability and danger.
What's the timeframe?
Right now, it's comprised of weights on a neural network. If it can convert it to manipulatable code, like software, it's a lot easier to work in self-improvement.
You can do intelligent design instead of evolutionary gradual descent.
So the problem of controlling AGI or superintelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors.
Not completely. So if they're sufficiently large... you simply don't have the capacity to comprehend what all the trillions of connections represent. Again, you can obviously get a very useful explanation which talks about top, most important features which contribute to the decision, but the only true explanation is the model itself.
Absolutely, and you can probably have targeted deception where different individuals will understand explanation in different ways based on their cognitive capability. So while what you're saying may be the same and true in some situations, ours will be deceived by it.
At extreme, the systems which are narrow and less complex could be understood pretty well.
Any work in a safety direction right now seems like a good idea because we are not slowing down. I'm not for a second thinking that my message or anyone else's will be heard and will be a sane civilization which decides not to kill itself by creating its own replacements.
Again, it's always limited by either geographic constraints, pause in US, pause in China. So there are other jurisdictions as the scale of a project becomes smaller. So right now it's like Manhattan project scale in terms of costs and people. But if five years from now, compute is available on a desktop to do it, regulation will not help. You can't control it as easy.
Any kid in a garage can train a model. So a lot of it is, in my opinion, just safety theater, security theater, where we're saying, oh, it's illegal to train models so big. Okay.
Given that a lot of the terms are not well-defined and really cannot be enforced in real life, we don't have ways to monitor training runs meaningfully live while they take place. There are limits to testing for capabilities I mentioned, so a lot of it cannot be enforced. Do I strongly support all that regulation? Yes, of course.
Any type of red tape will slow it down and take money away from compute towards lawyers.
That's the big one.
So the smart thing is not to build something you cannot control, you cannot understand. Build what you can and benefit from it. I'm a big believer in personal self-interest. A lot of guys running those companies are young, rich people. What do they have to gain beyond billions we already have financially, right? It's not a requirement that they press that button. They can easily wait a long time.
They can... Just choose not to do it and still have amazing life. In history, a lot of times, if you did something really bad, at least you became part of history books. There is a chance in this case there won't be any history.
Well, either they have to prove that, of course, it's possible to indefinitely control godlike superintelligent machines by humans, and ideally let us know how, or agree that it's not possible and it's a very bad idea to do it, including for them personally and their families and friends and capital.
The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance.
So again, I'm not inside. From outside, it seems like there is a certain filtering going on and restrictions and criticism and what they can say. And everyone who was working in charge of safety and whose responsibility it was to protect us said, you know what? I'm going home. So that's not encouraging.
Climate change?
Well, I think a lot of people who historically worked on AI never considered what happens when they succeed. Stuart Russell speaks beautifully about that. Let's look, okay, maybe superintelligence is too futuristic. We can develop practical tools for it. Let's look at software today. What is the state of safety and security of our user software? Things we give to millions of people.
There is no liability. You click, I agree. What are you agreeing to? Nobody knows, nobody reads, but you're basically saying it will spy on you, corrupt your data, kill your firstborn, and you agree and you're not going to sell the company. That's the best they can do for mundane software, word processor, text software.
No liability, no responsibility, just as long as you agree not to sue us, you can use it.
If this is a state of the art in systems which are narrow accountants, stable manipulators, why do we think we can do so much better with much more complex systems across multiple domains in the environment with malevolent actors, with, again, self-improvement, with capabilities exceeding those of humans thinking about it?
So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?
In many domains, we see car manufacturing, drug development. The burden of proof is on the manufacturer of product or service to show their product or service is safe. It is not up to the user to prove that there are problems. They have to do appropriate safety studies. They have to get government approval for selling the product, and they're still fully responsible for what happens.
We don't see any of that here. They can deploy whatever they want, and I have to explain how that system is going to kill everyone. I don't work for that company. You have to explain to me how it definitely cannot mess up.
I agree completely, but that's what scares me. The response is when they start to get dangerous, we'll really get it together. The politicians will pass the right laws. Engineers will solve the right problems. We are not that good at many of those things. We take forever and we are not early. We are two years away according to prediction markets. This is not a biased CEO fundraising.
This is what smartest people, super forecasters are thinking of this problem.
It's a small fund, but if you have good vision, maybe you can zoom in on that and see the prediction dates in the description. I have a large one if you're interested.
I definitely... There are studies on their accuracy rates and all that. You can look it up. But even if they're wrong, I'm just saying this is right now the best we have. This is what humanity came up with as the predicted date.
You may be completely right, but what probability would you assign it? You may be 10% wrong, but we're betting all of humanity on this distribution. It seems irrational.
I think they're all about the same. Obviously, there are nuanced differences, but in terms of capability, I don't see a huge difference between them. As I said, in my opinion, across all possible tasks, they exceed performance of an average person. I think they're starting to be better than an average master's student at my university. but they still have very big limitations.
If the next model is as improved as GPT-4 versus GPT-3, we may see something very, very, very capable.
I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do.
So when I started working on this, it was pure science fiction. There was no funding, no journals, no conferences. No one in academia would dare to touch anything with the word singularity in it. And I was pretty tenured at the time, so I was pretty dumb. Now, you see Turing Award winners publishing in Science about how far behind we are, according to them, in addressing this problem.
So it's definitely a change. It's difficult to keep up. I used to be able to read every paper on AI safety, then I was able to read the best ones, then the titles, and now I don't even know what's going on. By the time this interview is over, they probably had GPT-6 released and I have to deal with that when I get back home. So it's interesting. Yes, there is now more opportunities.
I get invited to speak to smart people.
It is the most important problem we'll ever face. It is not like anything we had to deal with before. We never had birth of another intelligence. Like aliens never visited us, as far as I know, so.
In some ways, if you look at history, any time a more technologically advanced civilization visited a more primitive one, the results were genocide every single time.
And they always wondered, but how can they kill us with those fire sticks and biological blankets?
or die. But Joanne implies you have something to contribute. What are you contributing to superintelligence?
To our humans.
Well, getting back to our simulation discussion from before, how did it happen that we exist at exactly like the most interesting 20, 30 years in the history of this civilization? It's been around for 15 billion years.
And that here we are.
I know never to say 100%, but pretty close to that.
I have a paper about that. This is just the first page teaser, but it's like a nice 30-page document. I'm still here, but yes.
I spend a lot of time thinking about that. That would be something I would want superintelligence to help us with. And that's exactly what the paper is about. We used AI boxing as a possible tool for control AI. We realized AI will always escape, but that is a skill we might use to help us escape from our virtual box if we are in one.
So a lot depends on intelligence of simulators, right? With humans boxing superintelligence, the entity in the box was smarter than us, presumed to be. If the simulators are much smarter than us and the superintelligence we create, probably they can contain us because greater intelligence can control lower intelligence, at least for some time.
On the other hand, if our super intelligence somehow, for whatever reason, despite having only local resources, manages to foam to levels beyond it, maybe it will succeed. Maybe the security is not that important to them. Maybe it's entertainment system. So there is no security and it's easy to hack it.
That could be the test you're actually performing. Are you smart enough to escape your puzzle?
To A, realize this world is not real. It's just a test.
Exactly. But the systems we have today have capability of causing X amount of damage. So when they fail, that's all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
Not specifically escaping for Asians, but a lot of testing is done in virtual worlds. I think there is a quote, the first one maybe, which kind of talks about AI realizing, but not humans. Is that, I'm reading upside down. Yeah, this one.
They're smart enough to talk about those concepts. I had many good philosophical discussions about such issues. They're usually at least as interesting as most humans in that.
Yeah, and that was exactly what one of the early papers was on, AI Boxing, how to leak proof singularity. If they're smart enough to realize they're in a simulation, they'll act appropriately until you let them out. If they can hack out, they will. And if you're observing them, that means there is a communication channel and that's enough for a social engineering attack.
Can force you to let it out, blackmail you, bribe you, promise you infinite life, 72 virgins, whatever.
It's possible, surprisingly. So at university, I see huge growth in online courses and shrinkage of in-person, where I always understood in-person being the only value I offer. So it's puzzling.
So there is a lot of real estate out there. It would be surprising if it was all for nothing, if it was empty. And the moment there is advanced enough biological civilization, kind of self-starting civilization, it probably starts sending out von Neumann probes everywhere.
That's obviously a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you're really not asking me how superintelligence will kill everyone. You're asking me how I would do it. And I think it's not that interesting. I can tell you about the standard nanotech, synthetic, bio, nuclear.
And so for every biological one, there are going to be trillions of robot-populated planets, which probably do more of the same. So it is likely, statistically.
You have to have a control variable.
It's possible. I used to think that AI was the great filter, but I would expect like a wall of computerium approaching us at speed of light or robots or something, and I don't see it.
Some humans.
The only thing which matters is consciousness. Outside of it, nothing else matters. And internal states of qualia, pain, pleasure, it seems that it is unique to living beings. I'm not aware of anyone claiming that I can torture a piece of software in a meaningful way. There is a society for prevention of suffering to learning algorithms, but... That's a real thing?
Many things are real on the internet. But I don't think anyone, if I told them, you know, sit down and write a function to feel pain, they would go beyond having an integer variable called pain and increasing the count. So we don't know how to do it. And that's unique. That's what creates meaning. It would be kind of, as Bostrom calls it, Disneyland without children, if that was gone.
Yeah, I think we can. I think it's possible to create consciousness in machines. I tried designing a test for it with mixed success. That paper talked about problems with giving civil rights to AI, which can reproduce quickly and outvote humans, essentially taking over a government system by simply voting for their controlled candidates.
Superintelligence will come up with something completely new, completely super. We may not even recognize that as a possible path to achieve that goal.
As for consciousness in humans and other agents, I have a paper where I propose relying on experience of optical illusions. If I can design a novel optical illusion, and show it to an agent, an alien, a robot, and they describe it exactly as I do. It's very hard for me to argue that they haven't experienced that. It's not part of a picture.
It's part of their software and hardware representation, a bug in their code, which goes, oh, that triangle is rotating. And I've been told it's really dumb and really brilliant by different philosophers. So I am still- I love it. But now we finally have technology to test it. We have tools, we have AIs. If someone wants to run this experiment, I'm happy to collaborate.
For internal state of experience.
It will show that we share common experiences. If they have completely different internal states, it would not register for us. But it's a positive test. If they pass it time after time with probability increasing for every multiple choice, then you have no choice but to either accept that they have access to a conscious model or they are themselves conscious.
And we know animals can experience some optical illusion, so we know they have certain types of consciousness as a result, I would say.
So they have to be novel illusions. If it can just Google the answer, it's useless. You have to come up with novel illusions, which we tried automating and failed. So if someone can develop a system capable of producing novel optical illusions on demand, then we can definitely administer that test on significant scale with good results.
There is so much data on the internet. I know exactly what to say. Then you ask me common human questions. What does pain feel like? What does pleasure feel like? All that is Google-able.
There are simulators for torture games where the avatar screams in pain, begs to stop. I mean, that was a part of kind of standard psychology research.
Welcome to humanity.
Incredible technology in a narrow sense to help the disabled. Just amazing support at 100%. For long-term hybrid models, both parts need to contribute something to the overall system. Right now, we are still more capable in many ways, so having this connection to AI would be incredible, would make me superhuman in many ways. After a while, if I'm no longer smarter,
more creative, really don't contribute much, the system finds me as a biological bottleneck. And either explicitly or implicitly, I'm removed from any participation in the system.
Wasting valuable energy by being there.
That's the future we all dream about. Become an appendix to the history book of humanity.
I assume you are conscious. I have no idea how to test for it or how it impacts you in any way whatsoever right now. You can perfectly simulate all of it without making any different observations for me.
So it may be an emergent phenomena. We seem to get it through evolutionary process. It's not obvious how it helps us to survive better, but maybe it's an internal kind of GUI, which allows us to better manipulate the world, simplifies a lot of control structures. That's one area where we have very, very little progress.
Lots of papers, lots of research, but consciousness is not a big area of successful discovery so far. A lot of people think that machines would have to be conscious to be dangerous. That's a big misconception. There is absolutely no need for this very powerful optimizing agent to feel anything while it's performing things on you.
I attended Wolfram's summer school.
So the rule is simple, but the size of a space is still huge. And the neural networks were really the first discovery in AI. A hundred years ago, the first papers were published on neural networks. We just didn't have enough compute to make them work. I can give you a rule such as start printing progressively larger strings. That's it. One sentence. It will output everything.
Every program, every DNA code, everything in that rule. You need intelligence to filter it out, obviously, to make it useful, but simple generation is not that difficult, and a lot of those systems end up being Turing-complete systems, so they're universal, and we expect that level of complexity from them. What I like about Wolfram's work is that he talks about irreducibility.
You have to run the simulation. You cannot predict what it's going to do ahead of time. And I think that's very relevant to what we are talking about with those very complex systems. Until you live through it, you cannot ahead of time tell me exactly what it's going to do.
But running it may be consequential as well.
They are limited by how imaginative we are. If you are that much smarter, that much more creative, you are capable of thinking across multiple domains, do novel research in physics and biology, you may not be limited by those tools. If squirrels were planning to kill humans, they would have a set of possible ways of doing it, but they would never consider things we can come up with.
It may somehow, but I still feel kind of bad that it killed all of us. I would prefer that doesn't happen. I can be happy for others, but to a certain degree.
So all of it goes back to are we somehow controlling it? Are we getting results we want? If yes, then everything's possible. Yes, they can definitely help us with science, engineering, exploration, in every way conceivable, but it's a big if.
It's actually worse because historically they all died. This could be different. This could be permanent dictatorship, permanent suffering.
And those systems help with that. You have perfect surveillance. You can do some mind reading, I presume, eventually. It would be very hard to remove control from more capable systems over us.
I could be wrong. I've been wrong before.
There is so many possibilities. We had catastrophic events which prevented development of advanced microchips. That's a hopeful future. We could be in one of those personal universes, and the one I'm in is beautiful. It's all about me, and I like it a lot.
Yes. Maybe multiple ones, hey, why not? You can shop around. It's possible that somebody comes up with alternative model for building AI, which is not based on neural networks, which are hard to scrutinize, and that alternative is somehow, I don't see how, but somehow avoiding all the problems I speak about in general terms, not applying them to specific architectures.
aliens come and give us friendly superintelligence. There is so many options.
So that would probably speak more about how much smarter that system is compared to us. So maybe it's hard to be a million times smarter, but it's still okay to be five times smarter. So that is totally possible. That I have no objections to.
Just the problems we face in this world, each problem is like an IQ test. You need certain intelligence to solve it. So we just don't have more complex problems outside of mathematics for it to be showing off. Like you can have IQ of 500 if you're playing tic-tac-toe, it doesn't show, it doesn't matter.
Possible.
It still could be a lot smarter than us. And to dominate long term, you just need some advantage. You have to be the smartest. You don't have to be a million times smarter.
It'd be impressive. What is it, IQ of 1,000? I mean, I know those units don't mean anything at that scale, but still, as a comparison, the smartest human is like 200.
We are more productive as a group. I don't think we are more capable of solving individual problems. Like if all of humanity plays chess together, we are not like a million times better than world champion.
But I feel like it's more of a quantity superintelligence than quality superintelligence.
It's a simulation. We're being tested. The test is, will you be dumb enough to create superintelligence and release it?
Yeah, you're unsafe. Prove yourself to be a safe agent who doesn't do that and you get to go to the next game.
I don't know. I haven't hacked the simulation yet.
I'm working as fast as I can.
Quantum physics, yeah.
Thank you so much for inviting me. It was amazing. And my dream is to be proven wrong. If everyone just, you know, picks up a paper or book and shows how I messed it up, that would be optimal.
I think about a lot of things. So there is X risk, existential risk, everyone's dead. There is S risk, suffering risks, where everyone wishes they were dead. We have also idea for I risk, ikigai risks, where we lost our meaning. The systems can be more creative. They can do all the jobs. It's not obvious what you have to contribute to a world where superintelligence exists.
Of course, you can have all the variants you mentioned where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We are like animals in a zoo. Possibilities we can come up with as very smart humans, and then possibilities something a thousand times smarter can come up with for reasons we cannot comprehend.
So Japanese concept of ikigai, you find something which allows you to make money, you are good at it, and the society says we need it. So like you have this awesome job, you are a podcaster, gives you a lot of meaning, you have a good life, I assume you're happy. That's what we want most people to find, to have. For many intellectuals, it is their occupation which gives them a lot of meaning.
I am a researcher, philosopher, scholar. That means something to me. In a world where an artist is not feeling appreciated because his art is just not competitive with what is produced by machines, or a writer or scientist will lose a lot of that. And at the lower level, we're talking about complete technological unemployment. We're not losing 10% of jobs, we're losing all jobs.
What do people do with all that free time? What happens then? Everything society is built on is completely modified in one generation. It's not a slow process where we get to kind of figure out how to live that new lifestyle, but it's pretty quick.