Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
MARK MANDELMANN- And constantly move the goalposts.
Now, that being said, one caveat on that is if software engineering is just dramatically better than computer use, I mean, computer use still sucks, then I'd be still like, oh, maybe everyone just kept focusing on software engineering.
It was just by far the most valuable thing.
Every marginal person and dollar went towards software engineering.
I don't think that's the case.
I do think computer use is valuable enough that people will care about it.
But that's my one escape hatch that I'm putting in place for next year.
Oh, like as in if the models didn't get any better?
So one intuition pump is this conversation was had a lot when models were like GPT-2 sized and fine-tuned for various things.
And people would find that the models were dramatically better at things that they were fine-tuned for.
But by the time you get to GPT-4, when it's trained on a wide enough variety of things, actually the โ
like the sort of total compute, like it generalized very well across all of the individual subtasks and actually generalized better than smaller fine-tuned models in a way that was extremely useful.
I think right now what we're seeing with RL is pretty much the same story playing out, where there's this jaggedness of things that they're particularly trained at, but as we expand the total amount of compute that we do RL with,
you'll start to see the same transition from GPT-2 fine tunes to GPT-3, GPT-4, unsupervised meta learning and generalization across things.
And I think we're already seeing early evidence of this in its ability to generalize reasoning to things.
I think this will be like extremely obvious.
MARK MANDELMANN- Well, I mean, have we ever RL'd the model to be an interp agent?
No.
FRANCESC CAMPOY- I mean, no.
Yeah, exactly.