Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
Yeah, you can come up to speed reasonably fast.
And it teaches you a lot of good intuitions of the actual intricacies of what's going on in the models, which means that you're then very well-placed to think about architecture and this kind of stuff.
One of my favorite people in thinking about architecture at Anthropic at the moment actually came from a heavy GPU kernel programming background, just knows the ins and outs really deeply and can think about the trade-offs really well.
Yeah, it was fun.
So in other words, my parents will finally understand what I do for a job.
What do they do?
They're like, cool.
Let's do it.
All right.
So Brian Krav asks, the issue you raised with Dario and occasionally tweet about relating to models not making connections across disparate topics, some sort of combinatorial attention challenge.
What are your thoughts on that now?
Do you solve it with scale, thinking models, or something else?
I think my answer at the moment is that
the sort of pre-training objective doesn't necessarily, like it imbues you with this like nice, flexible, general knowledge about the world, but doesn't necessarily imbue you with the, like the skill of making like novel connections or like research.
The kinds of things that people are trained to do through PhD programs and through like sort of the process of exploring and interacting with the world.
And so, yeah,
I think, like, at a minimum, you need significant RL in at least similar things to be able to approach, like, making novel discoveries.
And so I would like to see some early evidence of this as we start to build models that are sort of interacting with one, trying to make scientific discoveries, and sort of, like, modeling the behaviors that we expect of people in these positions, because I don't actually think we've done that in, like, a meaningful or scaled way as a field, so to speak.
A little bit like Gwen's theory, optimizer theory, no?
I get asked this question all the time.