Roman Yampolsky
👤 PersonAppearances Over Time
Podcast Appearances
What's the timeframe?
What's the timeframe?
What's the timeframe?
So the problem of controlling AGI or superintelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors.
So the problem of controlling AGI or superintelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors.
So the problem of controlling AGI or superintelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors.
The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance.
The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance.
The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? You get a new password, new credit card, you move on. Here, if we're talking about existential risks, you only get one chance.
So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?
So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?
So you're really asking me, what are the chances that we'll create the most complex software ever on the first try with zero bugs, and it will continue to have zero bugs for 100 years or more?
I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do.
I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do.
I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do.
Exactly. But the systems we have today have capability of causing X amount of damage. So when they fail, that's all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
Exactly. But the systems we have today have capability of causing X amount of damage. So when they fail, that's all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
Exactly. But the systems we have today have capability of causing X amount of damage. So when they fail, that's all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
That's obviously a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you're really not asking me how superintelligence will kill everyone. You're asking me how I would do it. And I think it's not that interesting. I can tell you about the standard nanotech, synthetic, bio, nuclear.
That's obviously a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you're really not asking me how superintelligence will kill everyone. You're asking me how I would do it. And I think it's not that interesting. I can tell you about the standard nanotech, synthetic, bio, nuclear.