Trenton Bricken

👤 Speaker

See mentions of this person in podcasts

1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But its chain of thought is totally misleading.

4796.868 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like it will make up random stuff that sounds plausible or that tries to sound as plausible as possible.

4799.773 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

But it's not at all representative of like the true answer.

4807.426 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Totally.

4838.508 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, yeah.

4838.948 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

It's just some people will hail chain of thought reasoning as a great way to solve AI safety.

4839.469 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Oh, I see.

4846.356 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And it's like, actually, we don't know whether we can trust it.

4846.876 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And that's like two very simple agents, right?

4940.35 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I mean, I think a nice halfway house here would be features that you learned from dictionary learning.

4941.933 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, that would be really cool.

4946.56 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Where it's like you get more internal access, but a lot of it is much more human interpretable.

4948.383 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You can also have each of the smaller, well, each of the agents can be a smaller model that's cheaper to run and you can fine tune it so that it's actually good at the task.

5080.163 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, you can't train on the RL reward unless the model generates some reward.

5191.516 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Let's you encode the important shit to not die.

5276.827 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, I'm the wrong person to ask, but there are interesting interpretability pieces where if we fine tune on math problems, the model just gets better at entity recognition.

5423.116 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So there's like a...

5437.206 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

a paper from David Bow's lab recently where they investigate what actually changes in a model when I fine tune it with respect to the attention heads and these sorts of things.

5438.168 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And they have this like synthetic problem of box A has this object in it.

5447.657 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Box B has this other object in it.

5453.582 View full episode →

← Previous Page 55 of 80 Next →

Report any issue