Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
I think like he's probably talked about it, but yeah.
That's actually a very bearish sign because one of the things we were chatting with one of our friends and he made the point that if you look at what new applications are unlocked by GPT-4 relative to GPT-3.5, it's not clear that's like that much.
Like a GPT-3.5 can do perplexity or whatever.
if there is this diminishing increase in capabilities and that increase costs exponentially more to get, that's actually a bearish sign on what 4.5 will be able to do or what 5 will unlock in terms of economic impact.
Will GoFi be part of the intelligence explosion?
where you say synthetic data, but in fact, it will be writing its own source code in some important way.
There was an interesting paper that you can use diffusion to come up with model weights.
I don't know how legit that was or whatever, but I don't know, something like that.
So crucially, the point being that the algorithmic overhead is really high in the sense that, and maybe this is something we should touch on explicitly of, even if you can't keep dumping more compute beyond the models that cost a trillion dollars or something,
the fact that the brain is so much more data efficient implies that if you get, we have the compute, if we had like the brain's algorithm to train, if we could like train as a sample efficient as humans train from birth, we could make the AGI.
How do we think about... What is the explanation of why that would be the case?
Like a bigger model just sees the exact same data.
At the end of seeing that data, it's...
Learn more from it.
It has more space to represent it.
Yeah.
For the audience, you should unpack why that, first of all, what superposition is and why that is the implication of superposition.
Okay, there's so many interesting threads there.
The first thing I want to ask is,
The thing you mentioned about these models are trained in a regime where they're over-parameterized.