Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Trenton Bricken

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So mechanistic interoperability, or the cool kids call it mechanterp, is trying to reverse engineer neural networks.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

and figure out kind of what the core units of computation are.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Lots of people think that because we made neural networks, because they're artificial intelligence, we have a perfect understanding of how they work, and it couldn't be further from the truth.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Neural networks, AI models that you use today, are grown, not built.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so we then need to do a lot of work after they're trained to figure out, to the best of our abilities, how they're actually going about their reasoning.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so two and a half, three and a half years ago, this kind of agenda of applying mechanistic interpretability to large language models started with Chris Ola leaving OpenAI, co-founding Anthropic.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Every roughly six months since then, we've had kind of like a major breakthrough in our understanding of these models.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so, first with Toy Models of Superposition, we established that models are really trying to cram as much information as they possibly can

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Into their weights and this goes directly against people saying that neural networks are over parameterized and like classic AI machine learning back in the day you would use linear regression or something like it and people had a meme of AI or neural networks deep learning be using way too many parameters and there's like this funny meme that you should show of like layers on the x-axis and layers on the y-axis and this like Jiggly line that just goes up and it's like oh just throw more layers

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But it actually turns out that at least for really hard tasks like being able to accurately predict the next token for the entire internet, these models just don't have enough capacity.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so they need to cram in as much as they can.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And the way they learn to do that is to use each of their neurons or units of computation in the model for lots of different things.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so if you try to make sense of the model and be like, oh, if I remove this one neuron, or like what is it doing in the model, it's impossible to make sense of it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It'll fire for like Chinese and fishing and horses and I don't know, just like a hundred different things.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And it's because it's trying to juggle all these tasks and use the same neuron to do it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So that's superposition.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Nine months later, we write towards monosemanticity, which introduces what are called sparse autoencoders.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so going off what I just said of the model trying to cram too much into too little space, we give it more space, this higher dimensional representation, where it can then more cleanly represent all of the concepts that it's understanding.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And this was a very toy paper in so much as it was a two-layer, really small, really dumb transformer.