Trenton Bricken

Normies who will listen will be like, you know... My two immediate cached responses to this are one, the work on Othello and now other games where it's like, I give you a sequence of moves in the game and it turns out if you apply some pretty straightforward interpretability techniques, then you can get a board that the model has learned.

5578.272 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And it's never seen the game board before anything, right?

5599.257 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like that's generalization.

5601.761 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

The other is Anthropix influence functions paper that came out last year where they look at the model outputs like, please don't turn me off.

5602.962 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I want to be helpful.

5611.575 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And then they scan like what was the data that led to that.

5612.896 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And like one of the data points that was very influential was someone dying of dehydration in the desert and like having like a will to keep surviving.

5615.741 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And to me, that just seems like very clear generalization of motive rather than regurgitating, don't turn me off.

5625.194 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I think 2001 A Space Odyssey was also one of the influential things.

5635.011 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so that's more related, but it's clearly pulling in things from lots of different distributions.