Eno Reyes
๐ค SpeakerAppearances Over Time
Podcast Appearances
And you sort of still to some extent have to think a little bit about
you know, what if it does, what if it gives the right answer, but it's not parsed correctly?
So there's a little bit of work required to make sure this is robust, but generally fairly reliable way to determine does the LLM actually know at this point in time what happened?
And so,
When we evaluated our compaction method and compared it to OpenAI's compression strategy, as well as Claude Code's compression strategy, we found that we had generally built one that was across all of these dimensions, much stronger at instruction following, continuity, completeness, but most importantly, just accuracy and context awareness, right?
Yeah.
Where it was just able to recall all of the critical pieces of information quite well.
Well, not just faster.
I think actually speed was relatively similar across the board.
But the two things that really matter is like the quality of the compression and how much it actually compresses, right?
Basically like the token reduction efficiency.
And we do have the worst token reduction efficiency.
You know, OpenAI is 99.3%.
uh cloud codes was 98.7 and ours was 98.6 so like 0.1 off maybe maybe that's within the error bars right but uh but the overall quality right you can basically take all of these characteristics and you can uh build sort of a quality score that just says you know across all these dimensions which one is stronger
I think that like probably the most important thing we learned was just how much structure matters.
Right.
So I think that probably the biggest failure case is generic summarization.
And I think that the worst performing in our evaluation techniques were the ones that basically just treat all content as equally compressible.
It's just one big summary and let the LLM figure it out.
A file path could be very low entropy information, but it's probably the most important piece of information an agent needs.