Trenton Bricken

👤 Speaker

See mentions of this person in podcasts

1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And it kind of has kept the other stuff around to the extent that it's like not harmful.

1638.115 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Here at the Tarkash podcast.

1684.759 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, but my question is always, like, are you giving the model enough context?

1722.279 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And with agents now, like, are you giving it the tools such that it can go and get the context that it needs?

1726.444 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Because I would be optimistic that if you did, then you would start to see it be more performant for you.

1731.878 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

All right, back to Trenton and Shoto.

1917.643 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, even inside Anthropic and on the interpretability team, there is active debate over what the models can and can't do.

1919.486 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so a few months ago, a separate team in the company, the model organisms team,

1927.883 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

created this, I'll call it an evil model for now, didn't tell anyone else what was wrong with it, and then gave it to different teams who had to investigate and discover what the evil behavior was.

1933.113 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so there were two interpretability teams that did this.

1947.456 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And we were ultimately successful.

1951.783 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

One of the teams actually won in 90 minutes.

1954.026 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

We were given three days to do it.

1955.489 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But more recently, I've developed what we're calling the interpretability agent, which is a version of Claw that has the same interpretability tools that we'll often use.

1958.273 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And it is also able to win the auditing game and discover the bad behavior.

1971.091 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

End-to-end?

1977.46 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

End-to-end, yeah.

1977.701 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

You give it the same prompt that the humans had.

1978.462 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

You fire it off, and it's able to converse with the model, the evil model,

1980.785 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

call the get top active features tool, which gives it the 100 most active features for whatever prompt it wanted to use.

1987.214 View full episode →

← Previous Page 8 of 80 Next →

Report any issue