Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Daniel Kokotajlo

👤 Person
608 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Classical liberalism just has been a helpful way to navigate the world when we're under this kind of epistemic hell of one thing changing just – you know, people who have – yeah.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Anyways, maybe one of you can actually flesh out that thought.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Better react to it if you disagree.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Here, here.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

I agree.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

So, so far, these systems, as they become smarter, seem to be more reliable agents who are more likely to do the thing I expect them to do.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Why does, like, I think in your scenario, at least one of the stories, you have two different stories, one with a slowdown, where we more aggressively, I'll let you characterize it.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

But in one half of the scenario, why does the story end in humanity getting disempowered and the thing just having its own crazy values and taking over?

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Yeah, so...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

It seems like this community is very interested in solving this problem at a technical level of making sure AIs don't lie to us, or maybe they lie to us in the scenarios exactly where we would want them to lie to us or something.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Whereas, you know, as you were saying, humans have these exact same problems.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

They reward hack.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

They are unreliable.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

They obviously do cheat and lie.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

And the way we've solved it with humans is just checks and balances, decentralization.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

You could, like, lie to your boss and keep lying to your boss, but over time it's just not going to work out with you or you become president or something.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Yeah, exactly.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

One or the other.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

So if you believe in this extremely fast takeoff if a lab is one month ahead, then that's the endgame and this thing takes over.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

But even then, I know I'm combining so many different topics.