Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And then the feature for like, you know, I guess maybe it's true because like the mass is like a gradient and like, you know, like, I don't know, but the polarity or whatever is a gradient as well.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But there's also a sense in which like there's the laws and the laws are more general and you have to understand like the general laws.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What is a compelling explanation to you, especially for very smart models of

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Um, like I understand why it made this output and it was like for a legit reason.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

If it's doing million line pull requests or something, what are you seeing at the end of that request where you're like, yep, that's chill.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But before I trace down on that, um,

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What does the reasoning circuit look like?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What would that look like when you found it?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Something is happening where when you pick up a new game and you immediately start understanding how to play it, and it doesn't seem like an Induction Heads kind of thing.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What would that... Because Induction Heads is like one-layer transformers.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Either two layers, yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So you can kind of see the thing that is a human picks up a new game and understands it.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

How would you think about what that is?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Presumably it's across multiple layers, but what would that physically look like?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

How big would it be maybe?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And, but like, is the story of how you found it with the reasoning thing is like, cause you won't be able to understand, or it'll just be like really, you know, it won't be something you can see in like a two layer transformer.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So will you just be like,

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

the circuit for deception or whatever it just this this part of the network fired when we at the end identified the thing as being deceptive this part and it didn't fire when we didn't identify it as being deceptive therefore this must be the deception circuit

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And that requires us at the end to be able to label which one is like bad and which one is good.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Although it's doing that by, I don't know, Chad GPT, I think it's probably modeling me because that's like what RLHF induces them to.