Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
There's your red fire alarm of like, oh, no, alignment is difficult.
Is everybody going to shut everything down now?
For you.
So you put a line there and everybody else puts a line somewhere else and there's like, yeah, and there's like no agreement.
We have had a pandemic on this planet with a few million people dead, which we may never know whether or not it was a lab leak, because there was definitely cover-up.
We don't know if there was a lab leak, but we know that...
The people who did the research put out the whole paper about this definitely wasn't a lab leak and didn't reveal that they had sent off coronavirus research to the Wuhan Institute of Virology after it was banned in the United States.
After the gain-of-function research was temporarily banned in the United States.
And
The same people who exported gain-of-function research on coronaviruses to the Wuhan Institute of Virology after that gain-of-function research was temporarily banned in the United States are now getting more grants to do more gain-of-function research on coronaviruses.
Maybe we do better in this than in AI, but this is not something we cannot take for granted that there's going to be an outcry.
People have different thresholds for when they start to outcry.
Nothing like the world in front of us right now.
You've already seen that GPT-4 is not turning out this way.
And there are basic obstacles where you've got the weak version of the system that doesn't know enough to deceive you, and the strong version of the system that could deceive you if it wanted to do that, if it was already sufficiently unaligned to want to deceive you.
There's the question of how on the current paradigm you train honesty when the humans can no longer tell if the system is being honest.
I think they could be answered if 50 years with unlimited retries the way things usually work in science.
Are Earth's billionaires going to put up the giant prizes that would maybe incentivize young hotshot people who just got their physics degrees to not go to the hedge funds and instead put everything into interpretability in this one small area where we can actually tell whether or not somebody has made a discovery or not?
I think so.
When?