Trenton Bricken
๐ค PersonAppearances Over Time
Podcast Appearances
I wonder if โ I should almost test, like, would an LLM have made that mistake?
Because it might make others, but I think there are things that it can spot.
Like, it would have no problem if I asked it to read through the entire tax code and then see what applied to me.
I mean, you can already think of models forever.
People have been calling the residual stream and multiple layers like poor man's adaptive compute.
We're like, if the model already knows the answer to something, it will compute that in the first few layers and then just pass it through.
I mean, that's getting into the weeds.
Right.
Yeah.
I think it's important to delineate between the model's planning in latent space in a single forward pass and the model has an alien language that it's outputting and using as its scratchpad.
Which one are we talking about?
No, but in the most extreme cases, it invents a new language that's super information dense.
I mean, that's what's so fun about the, if you look at the assistant tag, right?
Seeing these features light up in the auditing game for the model being evil.
Yeah.
Transluce has another example of this where you ask a Lama model, who is Nicholas Carlini?
And background context, Nicholas Carlini is a researcher who actually was a deep mind and has now come over to Anthropic.
But the model says, oh, I don't know who that is.
I couldn't possibly speculate.
But if you look at the features behind the scenes, you see a bunch light up for AI, computer security, all the things that Nicholas Carlini does.