Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
What was in this box?
And it makes sense, right?
It's like you're better at like attending to the positions of different things which you need for like coding and manipulating math equations.
I love this kind of research.
What's the name of the paper?
Do you know?
If you look up fine-tuning models math, David Bow's group that came out a week ago.
I'm not endorsing the paper.
That's a longer conversation, but it does talk about and cite other work on this entity recognition ability.
Normies who will listen will be like, you know... My two immediate cached responses to this are one, the work on Othello and now other games where it's like, I give you a sequence of moves in the game and it turns out if you apply some pretty straightforward interpretability techniques, then you can get a board that the model has learned.
And it's never seen the game board before anything, right?
Like that's generalization.
The other is Anthropix influence functions paper that came out last year where they look at the model outputs like, please don't turn me off.
I want to be helpful.
And then they scan like what was the data that led to that.
And like one of the data points that was very influential was someone dying of dehydration in the desert and like having like a will to keep surviving.
And to me, that just seems like very clear generalization of motive rather than regurgitating, don't turn me off.
I think 2001 A Space Odyssey was also one of the influential things.
And so that's more related, but it's clearly pulling in things from lots of different distributions.
Or induction heads.