Trenton Bricken

Depending on how quickly AI accelerates and where the state of our tools are, we might not be in the place where we can prove from the ground up that everything is safe.

6782.727 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But I feel like that's a very good North Star.

6792.879 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's a very powerful, reassuring North Star for us to aim for, especially when we consider we are part of the broader AI safety portfolio.

6795.983 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean do you really trust – like you're about to deploy this system and you really hope it's aligned with humanity and that you've like successfully iterated through all the possible ways that it's going to like scheme or sandbag.

6803.896 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Like we want to pursue the entire portfolio.

6834.796 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

We've got the therapist interrogating the patient by asking, do you have any troubling thoughts?

6837.459 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

We've got the linear probe, which I'd analogize to like a polygraph test where we're taking like

6844.868 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Very high-level summary statistics of the person's well-being.

6851.416 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then we've got the neurosurgeons kind of going in and seeing if you can find any brain components that are activating and troubling or off-distribution ways.

6854.884 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So I think we should do all of it.

6864.426 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

What percent of the alignment portfolio should it in Macintosh be?

6866.271 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment