Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
My first paper was mapping the cerebellum to the attention operation and transformers.
My next ones were looking at like sparsity.
How old were you when you wrote that?
It was my first year of grad school.
So 22.
But yeah, my next work was on sparsity in networks, like inspired by sparsity in the brain, which was when I met Tristan Hume and Anthropic was doing the SOLU, the soft max linear output unit work, which was very related in quite a few ways of like, let's make the activation of neurons across a layer really sparse.
And if we do that, then we can get some interpretability of what the neuron's doing.
I think we've updated on that approach towards what we're doing now.
So that started the conversation.
I shared drafts of that paper with Tristan.
He was excited about it.
And that was basically what led me to become Tristan's resident and then convert to full-time.
But during that period, I also moved as a visiting researcher to Berkeley and started working with Bruno Olshausen, both on what's called vector symbolic architectures, which one of the core operations of them is literally superposition.
and on sparse coding, also known as dictionary learning, which is literally what we've been doing since.
And Bruno Olshausen basically invented sparse coding back in 1997.
And so it was like my research agenda and the interpretability team seemed to just be running in parallel with just research tastes.
And so, yeah, it made a lot of sense for me to work with the team.
Um, and it's been a dream since.
Maybe you're right, but it's this sort of interesting pattern that... Yeah, but I mean, I literally met Tristan at a conference and didn't have a scheduled meeting or anything, just joined a little group of people chatting.
And he happened to be standing there and I happened to mention what I was working on.