Trenton Bricken

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

that are independent of the activations of the data.

9493.568 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I'm not saying we've made any progress here.

9497.177 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It's a very hard problem, but it feels like we'll have a lot more traction and be able to sanity check what we're finding with the weights if we're able to pull out features first.

9499.483 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

We will see.

9559.216 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like it really depends on two main things.

9560.218 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What is your expansion factors?

9563.907 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like how much are you projecting into the higher dimensional space?

9565.23 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And how much data do you need to put into the model?

9567.295 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

How many activations do you need to give it?

9569.2 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But this brings me back to the feature splitting to a certain extent.

9571.547 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Because if you know you're looking for specific features, you can start with a really cheaper course representation.

9574.214 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So maybe my expansion factor is only two.

9582.938 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So I have 1,000 neurons I'm projecting to a 2,000 dimensional space.

9585.845 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I get 2,000 features out, but they're really coarse.

9589.333 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so previously, I had the example for birds.

9591.679 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Let's move that example to I have a biology feature.

9594.325 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But I really care about if the model

9598.355 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

has representations for bioweapons and is trying to manufacture them.

9600.64 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so what I actually want is like an anthrax feature.

9604.528 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What you can then do is rather than, and let's say the anthrax, you only see the anthrax feature if instead of going from a thousand dimensions to 2000 dimensions, I go to a million dimensions, right?

9607.634 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment