Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so you can kind of imagine this big tree of semantic concepts where like biology splits into like cells versus like whole body biology.
And then further down, it splits into all these other things.
So rather than needing to immediately go from a thousand to a million and then picking out that one feature of interest, you can find the direction that the biology feature is pointing in, which again is very coarse, and then selectively search around that space.
Um, so like only do dictionary learning if this, if something in the direction of the biology feature fires first.
And so, um, the, the computer science metaphor here would be like, instead of doing breadth first search, you're able to do depth first search where you're only recursively expanding and exploring a particular part of this like semantic tree of features.
So I haven't read the Mistral paper, but I think that the heads, I mean, this goes back to like, if you just look at the neurons in a model, they're polysemantic.
And so if all they did was just look at the neurons in a given head, it's very plausible that it's also polysemantic because of superposition.
So this is a line of work that we haven't pursued as much as I want to yet.
But I think we're planning to.
I hope that maybe external groups do as well.
What is the geometry of feature space?
What's the geometry?
Exactly.
And how does that change over time?
Inject more structure into the geometry.
Totally.
I mean, it would really surprise me, I guess, especially given how linear the model seems to be.
Completely agree.
That there isn't some component of the anthrax feature, like vector, that is similar to and looks like the biology vector, and that they're not in a similar part of the space.
But yes, I mean, ultimately...