Dwarkesh
๐ค SpeakerAppearances Over Time
Podcast Appearances
And it basically comes down to points and who are the people leading this.
And like, I feel like the company leaders have so far made slightly better noises about caring about alignment than the government leaders have.
Right.
If I learn that Tulsi Gabbard has a less wrong alt with 10,000 karma, maybe I want the national security state.
for various reasons and like there just is so much incentive pressure for them to like win and beat each other and so forth and so even though they have more of the relevant expertise like i also just don't trust them to to do the right things so daniel has already said that for this phase we're not making policy prescriptions in another phase we may make policy suggestions and one of the ones that daniel has talked about
that makes a lot of sense to me, is to focus on things about transparency.
So a regulation saying there have to be whistleblower protections.
This is a big part of our scenario, is that a whistleblower comes out and says, the AIs are horribly misaligned and we're racing ahead anyway.
And then the government pays attention.
or another form of transparency saying that every lab just has to publish their safety case.
I'm not as sure about this one because I think they'll kind of fake it or they'll publish a made-for-public-consumption safety case that isn't their real safety case.
But at least saying, like, here is some reason why you should trust us.
And then if all independent researchers say, no, actually, you should not trust them, then I don't know, they're embarrassed and maybe they try to do better.
Right.
There's actually a really interesting foretaste of this.
At some point...
Somebody asked Grok, like, who is the worst spreader of misinformation?
And it responded... I think it just refused to respond to Elon Musk.
Somebody kind of jailbroke it into telling it it's prompt, and it was, like, don't say anything bad about Elon.
And then there was enough of an outcry that the head of XAI said, actually, that's not consonant with our values.