Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Daniel Kokotajlo

👤 Person
608 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

This is actually a significant part of your worldview, so can you explain...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Yeah, your thoughts on why transparency through this period is important.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Let me just add on to that one of the many other reasons why I worry about nationalization or some kind of public-private partnership or even just very stringent regulation.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Actually, this is more an argument against very stringent regulation in favor of safety rather than deferring more to the labs on the implementation.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

is that it just seems like we don't know what we don't know about alignment.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Every few weeks, there's this new result.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

OpenAI had this really interesting result recently where they're like, hey, they often tell you if they want to hack, like in the chain of thought itself, and it's important that you don't train against

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

the chain of thought where they tell you they're going to hack because they'll still do the hacking if you train against it.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

They just won't tell you about it.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

You can imagine very naive regulatory responses.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

It doesn't just have to be regulations.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

One might be more optimistic that if it's an executive order or something, it'll be more flexible.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

I just think that relies on a level of goodwill and...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

flexibility on the behalf of a regulator.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

But suppose the

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

There's some department that says, if we catch you, if we catch your AI saying that they want to take over or do something bad, then you'll be really heavily punished.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Your immediate response is allowed to just be like, okay, let's train them away from saying this.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

So you can imagine all kinds of ways in which a top-down mandate from the government to the labs of safety would just really backfire.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

And given how fast things are moving, maybe it makes more sense to,

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

leave these kinds of implementation decisions or even high-level overall... What is the word?