Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Trenton Bricken

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Um, and I'm not even interested in cases where the model is, is necessarily just like modeling the fact that deception has occurred.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So in an ideal world, we could just train on the whole data distribution and then find the directions that matter.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

To the extent that we need to reluctantly narrow down the subset of data that we're looking over, just for the purposes of scalability, we would use data that looks like the data you'd use to fit a linear probe.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But again, with a linear probe, you're also just finding one direction.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

We're finding a bunch of directions here.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Totally, yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I'm optimistic.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So I guess one thing is this is a bad time to answer this question because we are explicitly investing in the longer term of ASL 4 models, which GPT-7 would be.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So we split the team where a third is focused on scaling up dictionary learning right now.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And that's been great.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, we publicly shared some of our eight-layer results.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

We've scaled up quite a lot past that at this point.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But the other two groups, one is trying to identify circuits and then the other is trying to get the same success for attention heads.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So we're setting ourselves up and building the tools necessary to really find these circuits in a compelling way.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But it's going to take another, I don't know, six months before that's really working well.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But I can say that I'm optimistic and we're making a lot of progress.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

association or whatever you found?