Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
As it happens, I did like pioneer the like thing that appears when you hover over it.
So I actually do get some credit for the user, user experience there.
It's an incredible user experience.
You don't realize how pleasant that is.
I think Wikipedia actually picked it up from a prototype that was developed of a different system that I was putting forth.
Or maybe they developed it independently.
But for everybody out there who was like, no, no, they just got the Hover thing off of Wikipedia.
It's possible for all I know that Wikipedia got the Hover thing off of Orbital, which is a prototype then.
And anyways.
It was incredibly done, and the team behind it, well, thank you.
Okay, so the fundamental difficulty there is, suppose I said to you, well, how about if the AI helps you win the lottery by trying to guess the winning lottery numbers, and you tell it how close it is to getting next week's winning lottery numbers, and it just keeps on guessing and keeps on learning until finally you've got the winning lottery numbers.
One way of decomposing problems is suggestor-verifier.
Not all problems decompose like this very well, but some do.
If the problem is, for example, guessing a password that will hash to a particular hash text, where you have what the password hashes to, but you don't have the original password,
then if I present you a guess, you can tell very easily whether or not the guess is correct.
So verifying a guess is easy, but coming up with a good suggestion is very hard.
And when you can easily tell whether the AI output is good or bad or how good or bad it is, and you can tell that accurately and reliably, then you can train an AI to produce outputs that are better.
And if you can't tell whether the output is good or bad, you cannot train the AI to produce better outputs.
So the problem with the lottery ticket example is that when the AI says, well, what if next week's winning lottery numbers are dot, dot, dot, dot, dot, you're like, I don't know.
Next week's lottery hasn't happened yet.