Trenton Bricken
๐ค PersonAppearances Over Time
Podcast Appearances
Yeah.
But is that, are we going to?
It seems, I mean, it's an empirical question.
I think it's somewhat likely, if only because inference is expensive.
Producing tokens is expensive.
And so there will be an incentive to one, use as little thinking as you need to give the answer.
And two, if you're going to use thinking, use some complex compression.
I wonder if it will emerge more once we allow agents to talk to each other in ways where currently it's kind of trained more in isolation or with a human.
Yeah, I mean, one scary thing, though, is like the way we render text, you can use hidden white space tokens that also encode information.
That's true.
And so you can imagine a world where it looks like the agent's reasoning and it's scratchpad harmlessly, but it's actually hiding a bunch of data.
5%.
I mean, bringing it back to the there's so much low-hanging fruit, it's been wild seeing the efficiency gains that these models have experienced over the last two years.
And yeah, with respect to DeepSeq, I mean, just really hammering home, and Dario has a nice essay on this.
It's good, yeah.
Deep Seek was nine months after Claude III saw it.
And if we retrained the same model today or at the same time as the Deep Seek work, we also could have trained it for five million or whatever the advertised amount was.
And so what's impressive or surprising is that Deep Seek has gotten to the frontier.
But I think there's a common misconception still that they are above and beyond the frontier.
And I don't think that's right.