Trenton Bricken

👤 Speaker

See mentions of this person in podcasts

1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And then you can kind of identify this black hole region of feature space where everything else has been shifted away from it.

8676.778 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And there's this region and you haven't put in an input that causes it to fire.

8682.864 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But then you can start searching for what is the input that would cause this part of the space to fire?

8687.589 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What happens if I activate something in this space?

8692.474 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

There are a whole bunch of other ways that you can try and attack that problem.

8694.797 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, this gets into the fun space of like how universal our features across models and our towards monosemanticity paper looked at this a bit.

8733.959 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And we find I can't give you summary statistics, but like

8741.192 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

The base64 feature, for example, which we see across a ton of models.

8746.242 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

There are actually three of them, but they'll fire for and model base64 encoded text, which is prevalent in every URL.

8749.466 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And there are lots of URLs in the training data.

8756.434 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

They have really high cosine similarity across models.

8758.937 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So they all learn this feature.

8761.701 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, within a rotation, right?

8764.144 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But it's like, yeah, yeah, yeah.

8765.325 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, yeah.

8768.669 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And I wasn't part of this analysis, but yeah, it definitely finds the feature and they're like pretty similar to each other across two separate, two models, the same model architecture, but trained with different random seeds.

8769.531 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I think the David Bell Lab paper supports this.

8834.062 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You have that ability, and you're just getting better at entity recognition, fine-tuning that circuit instead of other ones.

8836.606 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So it's not that there aren't other hypotheses.

8945.45 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It's just I have been working on superposition for like a number of years.

8948.133 View full episode →

← Previous Page 67 of 80 Next →

Report any issue