Trenton Bricken

I think it's important to delineate between the model's planning in latent space in a single forward pass and the model has an alien language that it's outputting and using as its scratchpad.

4578.864 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Which one are we talking about?

4590.295 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

No, but in the most extreme cases, it invents a new language that's super information dense.

4599.649 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, that's what's so fun about the, if you look at the assistant tag, right?

4621.944 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Seeing these features light up in the auditing game for the model being evil.

4625.571 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah.

4630.319 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Transluce has another example of this where you ask a Lama model, who is Nicholas Carlini?

4632.522 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And background context, Nicholas Carlini is a researcher who actually was a deep mind and has now come over to Anthropic.

4637.949 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But the model says, oh, I don't know who that is.

4644.637 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I couldn't possibly speculate.

4646.379 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But if you look at the features behind the scenes, you see a bunch light up for AI, computer security, all the things that Nicholas Carlini does.

4647.941 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment