Sholto Douglas

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And if adaptive compute is solved, you can keep doing that.

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And in this case, if there's a quadratic penalty for attention, but you're doing long context anyways, then you're still dumping in more compute during, not during training or not during having bigger models, but just like, yeah.

872.654 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Breakdown for me, you're referring to this in some of your previous answers of

971.423 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

listen, you have these long contexts and you can hold more things in memory, but like ultimately comes down to your ability to mix concepts together to do some kind of reasoning.

976.57 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And these models aren't necessarily human level at that, even in context.

985.564 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Break down for me how you see storing just raw information versus reasoning and what's in between.

990.652 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like, where's the reasoning happening?

996.881 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Is that, where's just like storing raw information happening?

998.584 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What's different between them in these models?

1001.608 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Before we get deeper into this, we should explain to the audience, you referred earlier to Anthropic's way of thinking about transformers as these read-write operations that layers do.

1020.555 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

One of you should just kind of explain at a high level what you mean by that.

1030.586 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah.

1091.452 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I might like just like dumb it down, like as a way that would have made sense to me a few months ago of, okay, so you have...

1091.573 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

you know, whatever words are in the input you put into the model, all those words get converted into these tokens and those tokens get converted into these vectors.

1097.902 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And basically it's just like this small amount of information that's moving through the model.

1108.198 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And the way you explained it to me, Sheldon, this paper talks about is

1113.806 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

early on in the model maybe it's just doing some very basic things about like what do these tokens mean like if it says like 10 plus 5 just like moving information about to have the have that good representation exactly just represent in the middle maybe like the deeper thinking is happening about like how to think yeah how to solve this at the end you're converting it back into the output token because the end product is you're trying to predict the probability of the next token from the last of those residual streams

1117.392 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so, yeah, it's interesting to think about like just like the small compressed amount of information moving through the model and it's like getting modified in different ways.

1145.793 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Trenton, so you're it's interesting.

1154.005 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You're one of the few people who have like background from neuroscience.

1156.628 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment