Trenton Bricken
๐ค PersonAppearances Over Time
Podcast Appearances
And this time, the model will go through the same reasoning claiming to do the calculations, and at the end say, you're right, the answer's 4.
And if you look at the circuit, you can see that it's not actually doing any of the math,
It's paying attention to that you think the answer is four, and then it's reasoning backwards about how it can manipulate the intermediate computation to give you an answer of four.
I've done that.
Who hasn't?
Who hasn't?
Totally.
So I guess there are a few crazy things here.
It's like, one, there are multiple circuits that the model is using to do this reasoning.
Two, is that you can actually see if it's doing the reasoning or not.
And three, the scratch pad isn't giving you this information.
Two fun analogies for you.
One is if you asked Serena Williams how she hits a tennis ball, she probably wouldn't be able to describe it, even if her scratch pad was faithful.
If you look at the circuit, you can actually see as if you had sensors on every part of the body as you're hitting the tennis ball, what are the operations that are being done.
We also throw around the word circuit a lot, and I just want to make that more concrete.
So this is features across layers of the model all working in cooperation to perform a task.
And so a fun analogy here is you've got the Ocean's Eleven bank heist team in a big crowd of people.
The crowd of people is all the different possible features.
And you could โ we're trying to pick out in this crowd of people who is on the heist team.
and all their different functions that need to come together in order to successfully break into the bank, right?