Trenton Bricken

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, my very naive take here would just be that, like, so one thing that the superposition hypothesis that interpretability has pushed is that your model is dramatically under-parameterized.

4013.281 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And that's typically not the narrative that deep learning is pursued, right?

4025.297 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But if you're trying to train a model on the entire internet and have it predicted with incredible fidelity, you are in the under-parameterized regime.

4028.221 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

and you're having to compress a ton of things and take on a lot of noisy interference in doing so.

4035.451 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so having a bigger model, you can just have cleaner representations that you can work with.

4040.658 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Sure, yeah.

4051.493 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So the fundamental result, and this was before I joined Anthropic, but the paper's titled Toy Models of Superposition, finds that even for small models, if you are in a regime where your data is high dimensional,

4052.113 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

and sparse, and by sparse I mean any given data point doesn't appear very often, your model will learn a compression strategy, which we call superposition, so that it can pack more features of the world into it than it has parameters.

4064.51 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so the sparsity here is like, and I think both of these constraints apply to the real world, and modeling internet data is a good enough proxy for that, of like, there's only one door cache.

4083.23 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like there's only one shirt you're wearing.

4094.142 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

There's like this liquid death can here.

4095.564 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so these are all objects or features and how you define a feature is tricky.

4097.607 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so you're in a really high dimensional space because there are so many of them and they appear very infrequently.

4102.515 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And in that regime, your model will learn compression.

4109.225 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

To riff a little bit more on this, I think it's becoming increasingly clear.

4112.69 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I will say, I believe that the reason networks are so hard to interpret is

4116.936 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

is because, in a large part, this superposition.

4122.104 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So if you take a model and you look at a given neuron in it, a given unit of computation, and you ask, how is this neuron contributing to the output of the model when it fires?

4125.428 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And you look at the data that it fires for, it's very confusing.

4134.22 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It'll be like 10% of every possible input, or like Chinese, but also fish, and trees, and the word, a full stop in URLs.

4137.785 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment