Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
So yeah, Opus 4.5 matches Sonnet 4.5.
So Sonnet, the next step down in the models scheme at Anthropic.
It matches Sonnet 4.5 using 76% fewer tokens.
So one quarter of the tokens.
At full throttle, so adding thinking effort, it beats Sonnet by 4.3 percentage points while using 50% fewer tokens, half the tokens.
So there's real efficiency built into 4.5.
And I'm not sure what steps they took.
There's no explanation yet, but I'm sure people will be studying this.
How did they get such an incredible improvement in the number of tokens consumed in the process of thinking and still beat the scores?
Yeah, there's some tuning there that is not completely revealed yet, but I'm sure, you know, Carl probably read the system card and there's a lot of information available there.
But again, this just came out like we're talking about it today because it came out yesterday during the day.
Yeah, let me explain how prompt caching works.
So if you're in a coding session or just in a conversation with a model and you have something that is like a guidebook for what you're talking about in that conversation, you put that into the prompt.
And then what that does is it becomes a part of the prompt that gets sent with each additional query.
And so you're basically consuming a lot of tokens repeatedly that you don't need to do.
Prompt caching takes something that you set as a context for this whole conversation and puts it in memory, basically.
So you're not feeding it through every single time as new context for the inference run.
So that prompt caching basically economizes on the number of tokens that are being used in every turn of the conversation.
by a special system but that i don't fully understand like how do they actually manage this when they're feeding all of these you know tokens in through the system i i'm not clear about how that works but the the idea is very fundamental and it is it's going to keep that one live in context without you having to feed it in as a prompt each time yeah