Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Trenton Bricken

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Even with the feature discussion, defining what a feature is is really hard.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so the question feels almost too slippery.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What is a feature?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

A direction and activation space.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

A latent variable that is operating behind the scenes that has causal influence over the system you're observing.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It's a feature if you call it a feature.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It's tautological.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

These are all explanations that I feel some...

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

If that neuron corresponds to... To something in particular.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Right.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, yeah, yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And no, I think that's useful as like, what do we want a feature to be, right?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like what is a synthetic problem under which a feature exists?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But even with the Towards Monosemanticity work, we talk about what's called feature splitting, which is basically you will find as many features as you give the model the capacity to learn.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And by model here, I mean the up projection that we fit after we trained the original model.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so if you don't give it much capacity, it'll learn a feature for bird.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But if you give it more capacity, then it will learn like ravens and eagles and sparrows and like specific types of birds.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I'm not sure what we would mean by... I mean, all of those things are like discrete units that have connections to other things that then imbues them with meaning.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

That feels like a specific enough definition that it's useful or not too all-encompassing, but feel free to push back.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, if the features we were finding weren't predictive or if they were just representations of the data, right, where it's like, oh, all you're doing is just clustering your data and there's no like higher level associations that are being made or it's some like phenomenological thing of like,