Trenton Bricken
๐ค PersonAppearances Over Time
Podcast Appearances
I think they just waited.
and then were able to take advantage of all the efficiency gains that everyone else was also seeing.
Yeah, and to go from like way behind the frontier to like, oh, this is like a real player.
Yeah, I don't know about fractions.
It might be like you have a hunch for a core problem.
You can think of 10 possible ways to solve it.
And then you just need to try them and see what works.
And that's kind of where the trial and error like sorcery of deep learning can kind of kick in.
But Dorcas, you said, oh, well, the model can do the more straightforward things and not the deeper thought.
I mean, I do want to push back on that a little bit.
I think, again, if the model has the right context and scaffolding, it's starting to be able to do some really interesting things.
The Interp agent has been a surprise to people, even internally, at how good it is at finding the needle in the haystack, like when it plays the auditing game, finding this reward model bias feature, and then reasoning about it, and then systematically testing its hypotheses.
So it looks at that feature, then it looks at similar features.
It finds one with a preference for chocolate.
It's like, huh, that's really weird that the model wants to add chocolate to recipes.
Let me test it.
And so then it will make up like, hey, I'm trying to make a tomato soup.
What would be a good ingredient for it?
And then sees that the model replies chocolate.
reasons through it, and then keeps going, right?