Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
The settings where today's mechanistic interpretability tools provide such clean, complete, and unambiguously correct results are very rare.
There's a details box here with the title Aren't Most Networks Uninterpretable?
Why this doesn't disprove the thesis?
The box contents are omitted from this narration.
Therefore, while mechanistic interpretability can supply plausibility to our hypothesis, we need to move towards more indirect evidence to start building a positive case.
Heading.
Indirect evidence.
Quote.
Just before OpenAI started, I met Ilya Sutskiva.
One of the first things he said to me was, look, the models, they just want to learn.
You have to understand this.
The models, they just wanna learn.
And it was a bit like a Zen koan.
I listened to this and I became enlightened.
What that told me is that the phenomenon that I'd seen wasn't just some random thing.
It was broad, it was more general.
The models just wanna learn.
You get the obstacles out of their way.
You give them good data.
You give them enough space to operate in.