Trenton Bricken

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Um, and I'm not even interested in cases where the model is, is necessarily just like modeling the fact that deception has occurred.

10474.332 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So in an ideal world, we could just train on the whole data distribution and then find the directions that matter.

10500.263 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

To the extent that we need to reluctantly narrow down the subset of data that we're looking over, just for the purposes of scalability, we would use data that looks like the data you'd use to fit a linear probe.

10510.357 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But again, with a linear probe, you're also just finding one direction.

10520.531 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

We're finding a bunch of directions here.

10525.197 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Totally, yeah.

10537.592 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I'm optimistic.

10549.606 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So I guess one thing is this is a bad time to answer this question because we are explicitly investing in the longer term of ASL 4 models, which GPT-7 would be.

10551.209 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So we split the team where a third is focused on scaling up dictionary learning right now.

10560.669 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And that's been great.

10564.397 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, we publicly shared some of our eight-layer results.