Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And if adaptive compute is solved, you can keep doing that.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And in this case, if there's a quadratic penalty for attention, but you're doing long context anyways, then you're still dumping in more compute during, not during training or not during having bigger models, but just like, yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Breakdown for me, you're referring to this in some of your previous answers of

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

listen, you have these long contexts and you can hold more things in memory, but like ultimately comes down to your ability to mix concepts together to do some kind of reasoning.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And these models aren't necessarily human level at that, even in context.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Break down for me how you see storing just raw information versus reasoning and what's in between.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Like, where's the reasoning happening?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Is that, where's just like storing raw information happening?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

What's different between them in these models?

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Before we get deeper into this, we should explain to the audience, you referred earlier to Anthropic's way of thinking about transformers as these read-write operations that layers do.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

One of you should just kind of explain at a high level what you mean by that.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

I might like just like dumb it down, like as a way that would have made sense to me a few months ago of, okay, so you have...

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

you know, whatever words are in the input you put into the model, all those words get converted into these tokens and those tokens get converted into these vectors.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And basically it's just like this small amount of information that's moving through the model.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And the way you explained it to me, Sheldon, this paper talks about is

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

early on in the model maybe it's just doing some very basic things about like what do these tokens mean like if it says like 10 plus 5 just like moving information about to have the have that good representation exactly just represent in the middle maybe like the deeper thinking is happening about like how to think yeah how to solve this at the end you're converting it back into the output token because the end product is you're trying to predict the probability of the next token from the last of those residual streams

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so, yeah, it's interesting to think about like just like the small compressed amount of information moving through the model and it's like getting modified in different ways.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Trenton, so you're it's interesting.

Dwarkesh Podcast
Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You're one of the few people who have like background from neuroscience.