Trenton Bricken

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I mean, even with the 4.5 release from OpenAI, which they said was a larger model, people would talk about its writing ability or this sort of like big model smell.

1413.372 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think this is kind of getting at this like,

1424.629 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

deeper pool of intelligence or ability to generalize.

1427.453 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, all of the interpretability work on superposition states that the models are always underparameterized and they're being forced to cram as much information in as they possibly can.

1431.617 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so if you don't have enough parameters and you're rewarding the model just for like imitating certain behaviors, then it's less likely to have the space to form these like very deep, broader generalizations.

1441.406 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, yeah, yeah.

1466.826 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So yeah, in the circuits work, I mean, even with the Golden Gate Bridge, and by the way, this is a...

1467.588 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

a cable from the Golden Gate Bridge that the team acquired.

1472.24 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

They had to destabilize the bridge in order to get this.

1476.205 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But Claude will fix it.

1480.131 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Claude loves the Golden Gate Bridge.

1481.192 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So even with this, for people who aren't familiar, we made Golden Gate Claude when we released our paper scaling monosemanticity.

1483.415 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

where one of the 30 million features was for the Golden Gate Bridge.

1491.106 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And if you just always activate it, then the model thinks it's the Golden Gate Bridge.

1494.912 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

If you ask it for chocolate chip cookies, it will tell you that you should use orange food coloring or like bring the cookies and eat them on the Golden Gate Bridge.

1498.938 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

All of these sort of associations.

1506.33 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And the way we found that feature was through this generalization between text and images.

1508.313 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So...

1514.222 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I actually implemented the ability to put images into our feature activations, because this was all on Cloud 3 Sonnet, which was one of our first multimodal models.

1515.524 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So we only trained the sparse autoencoder and the features on text.

1526.377 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment