Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Trenton Bricken

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, my very naive take here would just be that, like, so one thing that the superposition hypothesis that interpretability has pushed is that your model is dramatically under-parameterized.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And that's typically not the narrative that deep learning is pursued, right?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But if you're trying to train a model on the entire internet and have it predicted with incredible fidelity, you are in the under-parameterized regime.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

and you're having to compress a ton of things and take on a lot of noisy interference in doing so.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so having a bigger model, you can just have cleaner representations that you can work with.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Sure, yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So the fundamental result, and this was before I joined Anthropic, but the paper's titled Toy Models of Superposition, finds that even for small models, if you are in a regime where your data is high dimensional,

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

and sparse, and by sparse I mean any given data point doesn't appear very often, your model will learn a compression strategy, which we call superposition, so that it can pack more features of the world into it than it has parameters.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so the sparsity here is like, and I think both of these constraints apply to the real world, and modeling internet data is a good enough proxy for that, of like, there's only one door cache.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like there's only one shirt you're wearing.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

There's like this liquid death can here.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so these are all objects or features and how you define a feature is tricky.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so you're in a really high dimensional space because there are so many of them and they appear very infrequently.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And in that regime, your model will learn compression.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

To riff a little bit more on this, I think it's becoming increasingly clear.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I will say, I believe that the reason networks are so hard to interpret is

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

is because, in a large part, this superposition.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So if you take a model and you look at a given neuron in it, a given unit of computation, and you ask, how is this neuron contributing to the output of the model when it fires?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And you look at the data that it fires for, it's very confusing.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It'll be like 10% of every possible input, or like Chinese, but also fish, and trees, and the word, a full stop in URLs.