Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Trenton Bricken

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so in the paper, we also ask it, what paper did Andre Karpathy write?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so it recognizes the name Andre Karpathy because he's sufficiently famous.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So that turns off the I don't know reply.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But then when it comes time for the model to say what paper it worked on, it doesn't actually know any of his papers.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so then it needs to make something up.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so you can see different components and different circuits all interacting at the same time to lead to this final answer.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I feel like you just want to go in with your eyes wide open, not making any assumptions for what that deception is going to look like.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

or what the trigger might be.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so the wider you can cast that net, the better.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Depending on how quickly AI accelerates and where the state of our tools are, we might not be in the place where we can prove from the ground up that everything is safe.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But I feel like that's a very good North Star.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It's a very powerful, reassuring North Star for us to aim for, especially when we consider we are part of the broader AI safety portfolio.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I mean do you really trust โ€“ like you're about to deploy this system and you really hope it's aligned with humanity and that you've like successfully iterated through all the possible ways that it's going to like scheme or sandbag.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Like we want to pursue the entire portfolio.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

We've got the therapist interrogating the patient by asking, do you have any troubling thoughts?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

We've got the linear probe, which I'd analogize to like a polygraph test where we're taking like

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Very high-level summary statistics of the person's well-being.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And then we've got the neurosurgeons kind of going in and seeing if you can find any brain components that are activating and troubling or off-distribution ways.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So I think we should do all of it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

What percent of the alignment portfolio should it in Macintosh be?