Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Do you want to explain what feature splitting is?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So give an example, potentially, of that.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Okay, so let's go back to GPT-7.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

First of all, is this a sort of like linear tax on any model to figure out?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Even before that, is this a one-time thing you had to do or is this the kind of thing you have to do on every output?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Or is it just like one time, it's not deceptive, we're good to go roll?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Actually, yeah, let me let you answer that.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

For the audience, weights are

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I don't know if permanent is the right word, but they are the model itself, whereas activations are the artifacts of any single call.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So there's going to be two steps to this for GPT-7 or whatever model we're concerned about.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

First, correct me if I'm wrong, but training the sparse autoencoder and do the unsupervised projection into a wider space of features that have a higher fidelity to what is actually happening in the model.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And then secondly, label those features.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Because let's say like the cost of training the model is N. What will those two steps cost relative to N?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Although given the way that these features are not organized in, um,

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

things that are intuitive for humans, right?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like, cause we just don't have to deal with basics before, so we don't have that many, you know, we just don't dedicate that much, like whatever, firmware to like deconstructing, which kind of basics before it is.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

How would we know that the subjects, and this will go back to maybe the MOE discussion we'll have of,

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I guess we might as well talk about it, but like in mixture of experts, the mixture of paper talked about how they couldn't find the experts weren't specialized in a way that we could understand.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

There's not like a chemistry expert or a physics expert or something.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So why would you think that like it will be like biology feature and then deconstruct rather than like blah and then you just deconstruct and it's like anthrax and you're like shoes and whatever.