Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh

๐Ÿ‘ค Speaker
1735 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

This was a mistake.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

We're going to take it out.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

So we kind of want more things like that to happen where people are looking at, like...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

Here it was the prompt, but I think very soon it's going to be the spec where it's kind of more of an agent and it's understanding the spec in a deeper level.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

And just thinking about that and being, and if it says like, by the way, try to manipulate the government into doing this or that, then we know that something bad has happened and

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

If it doesn't see that, then we can maybe trust it.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

This is actually part of our misalignment story, is that if the AI is sufficiently misaligned, then yes, we can tell it it has to follow the spec.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

But just as people with different views of the Constitution have managed to get it into a shape that probably the founders would not have recognized, so the AI will be able to say, well, the spec refers to the general welfare here.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

I think we agree.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

I think that's kind of why all of our policy prescriptions are things like more transparency, get more people involved, try to have lots of people working on this.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

I think our epistemic prediction is that it's hard to maintain classical liberalism as you go into these really difficult arms races in times of crisis.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

But I think that our policy prescription is let's try as hard as we can to make it happen.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

Yeah, so I agree that the AIs are currently getting more reliable.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

I think there are two reasons why they might fail to do what you want, kind of reflecting how they're trained.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

One is that they're too stupid to understand their training.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

The other is that you were too stupid to train them correctly, and they understood what you were doing exactly, but you messed it up.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

So I think the first one is kind of what we're coming out of.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

So GPT-3, if you asked it, are bugs real, it would give this kind of hemming-hawing answer like, oh, we can never truly tell what is real.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

Who knows?

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model โ€” Scott Alexander & Daniel Kokotajlo

Because it was trained kind of don't take difficult political positions and a lot of questions like is X real or things like is God real where you don't want it to really answer that.