Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Trenton Bricken

👤 Person
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I mean, even with the 4.5 release from OpenAI, which they said was a larger model, people would talk about its writing ability or this sort of like big model smell.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think this is kind of getting at this like,

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

deeper pool of intelligence or ability to generalize.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, all of the interpretability work on superposition states that the models are always underparameterized and they're being forced to cram as much information in as they possibly can.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so if you don't have enough parameters and you're rewarding the model just for like imitating certain behaviors, then it's less likely to have the space to form these like very deep, broader generalizations.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, yeah, yeah.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So yeah, in the circuits work, I mean, even with the Golden Gate Bridge, and by the way, this is a...

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

a cable from the Golden Gate Bridge that the team acquired.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

They had to destabilize the bridge in order to get this.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But Claude will fix it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Claude loves the Golden Gate Bridge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So even with this, for people who aren't familiar, we made Golden Gate Claude when we released our paper scaling monosemanticity.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

where one of the 30 million features was for the Golden Gate Bridge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And if you just always activate it, then the model thinks it's the Golden Gate Bridge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

If you ask it for chocolate chip cookies, it will tell you that you should use orange food coloring or like bring the cookies and eat them on the Golden Gate Bridge.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

All of these sort of associations.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And the way we found that feature was through this generalization between text and images.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So...

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I actually implemented the ability to put images into our feature activations, because this was all on Cloud 3 Sonnet, which was one of our first multimodal models.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So we only trained the sparse autoencoder and the features on text.