Trenton Bricken

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

That's a red flag.

9251.648 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You could also coarse grain it so that it's just a single base 64 feature.

9253.59 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, even the fact that this came up and we could see that it specifically favors these particular outputs and it fires for these particular inputs gets you a lot of the way there.

9257.134 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I'm even familiar of cases from the auto interp side where a human will look at a feature and try to annotate it for it fires for auto.

9266.405 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

latin words and then when you ask the model to classify it it says it fires for latin words defining plants so it can like already like beat the human in some cases for like labeling what's going on so at scale this would require an adversarial um uh

9273.873 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, but you can even automate this process, right?

9306.404 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, this goes back to the determinism of the model.

9308.306 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You could have a model that is actively editing input text and predicting if the feature is going to fire or not, and figure out what makes it fire, what doesn't, and search the space.

9311.149 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Especially for scalability.

9328.448 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I think it's underappreciated right now.

9329.611 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, so at some point, I think you might just start fitting noise or things that are part of the data, but that the model isn't actually representing.

9340.333 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, yeah.

9350.611 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So it's the part before where the model will learn however many features it has capacity for that still span the space of representation.

9351.292 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, so if you don't give the model that much capacity for the features it's learning, concretely, if you project to not as high a dimensional space,

9362.011 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

it will learn one feature for birds.

9371.107 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But if you give the model more capacity, it will learn features for all the different types of birds.

9373.711 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so it's more specific than otherwise.

9379.2 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And oftentimes, there's the bird vector that points in one direction, and all the other specific types of birds point in a similar region of the space, but are obviously more specific than the course label.

9382.847 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, so you do dictionary learning after you've trained your model.

9414.175 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And you feed it a ton of inputs.

9417.759 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment