Trenton Bricken

And the fact that the model learned these three different features and it took us a little while to figure out what was going on is very Shoggoth-esque.

9165.66 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, and it's clearly doing something that humans wouldn't.

9182.739 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You can even talk to any of the current models in Base64 and it will apply in Base64.

9186.886 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And you can then decode it and it works great.

9191.873 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

No, no, I mean, you could do that, right?

9230.68 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What I was going to say is one technique here is anomaly detection.

9231.862 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so one beauty of dictionary learning instead of linear probes is that it's unsupervised.

9235.226 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You are just trying to learn to span all of the representations that the model has and then interpret them later.

9240.593 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But if there's a weird feature that suddenly fires for the first time that you haven't seen fire before...

9247.763 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment