Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
the things you would have learned from more language tokens anyway, you can just get from YouTube.
Has that actually been the case?
How much positive transfer do you see between different modalities where actually the images are helping you be better at writing code or something, just because the model is learning a latent capabilities just from trying to understand the image?
One of the things you mentioned to me a long time ago is the evidence that when you train LLMs on code, they get better at reasoning and language.
Which, unless it's the case that the comments in the code are just really high quality tokens or something, implies that to be able to think through how to code better, it makes you
like a better reasoner and like, that's crazy, right?
Like, I think that's like one of the strongest pieces of evidence for like scaling, just making the thing smart.
Like that kind of like positive transfer.
Right.
And crucially, the thing that makes us significant is that...
it's not just stochastically predicting the next token of words or whatever, because it's learned that a sally corresponds to murderer at the end of a Sherlock Holmes story.
No, if there is some shared thing between code and language, it must be at a deeper level than the model has learned.
Okay, so I want to take a step back from the research and...
ask about your careers specifically, because like the tweet implied that I introduced you with, you've been in this field a year and a half.
I think you've only been in it like a year or something, right?
It's like...
Yeah.
But, you know, like in that time, I know the solve the alignment takes are overstated and you won't say this yourself because you'd be embarrassed of it.
Like, you know, it's like a pretty incredible thing.
Like the thing that people in mechanistic relativity think is the biggest, you know, step forward.