Trenton Bricken

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And you get the activations from those.

9421.043 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And then you do this projection into the higher dimensional space.

9422.925 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so the method is unsupervised in that it's trying to learn these sparse features.

9425.388 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You're not telling them in advance what they should be.

9429.973 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But it is constrained by the inputs you're giving the model.

9432.336 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I guess two caveats here.

9437.562 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

One, we can...

9438.984 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

try and choose what inputs we want.

9441.88 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So if we're looking for theory of mind features that might lead to deception, we can put in the sycophancy data set.

9443.563 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Hopefully at some point we can move into looking at the weights of the model alone, or at least using that information to do dictionary learning.

9448.03 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But I think in order to get there, that's like such a hard problem that you need to make traction on just learning what the features are first.

9456.745 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But yeah, so what's the cost of this?

9464.038 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Can you repeat the last sentence?

9465.741 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

weights of the model alone?

9466.602 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So like right now we just have these neurons in the model.

9468.964 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

They don't make any sense.

9472.107 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

We apply dictionary learning, we get these features out, they start to make sense.

9473.328 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But that depends on the activations of the neurons.

9478.013 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

The weights of the model itself, like what neurons are connected to what other neurons, certainly has information in it.

9481.236 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And the dream is that we can kind of bootstrap towards actually making sense of the weights of the model

9487.482 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment