Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
Help me out here.
I'm trying not to be.
I'm also trying to be constrained to say things that I think are true and not just things that get you to agree with me.
Okay, so I'll get there, but the one thing I want to note about is that this has not been remotely how things have been playing out so far.
The capabilities are going like, and the alignment stuff is crawling like a tiny little snail in comparison.
Got it.
So if this is your hope for survival, you need the future to be very different from how things have played out up to right now.
And you're probably trying to slow down the capability gains because there's only so much you can speed up that alignment stuff.
So again, the difficulty is what makes the human say, I understand.
And is it true?
Is it correct?
Or is it something that fools the human?
When the verifier is broken, the more powerful suggester does not help.
It just learns to fool the verifier.
Previously, before all hell started to break loose in the field of artificial intelligence, there was this person trying to raise the alarm and saying, you know, in a sane world, we sure would have a bunch of physicists working on this problem before it becomes a giant emergency.
And other people being like, ah, well, you know, it's going really slow.
It's going to be 30 years away.
And only in 30 years will we have systems that match the computational power of human brains.
So now it's 30 years off.
We've got time.