Eno Reyes
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so it's effectively a while loop in a while loop.
I honestly think that the existence of plugins like that means the agents harnesses aren't taking advantage of what they could be, but regardless, people find this out and then they build and they customize.
And I think that's one of the coolest parts about building a really like simple and modular tool is the community goes out and they find all these interesting ways to use the tool and they get a lot of value out of it.
Yeah, this is a really tricky problem because I think that, you know, we've actually published like a couple of different posts about just how hard this problem is ranging from, you know, there is tons of information and there's obviously context windows in all models, right?
You have, let's say, 1 million tokens of context or 2 million tokens of context, right?
There is cost questions involved, right?
So most model providers actually increase the price that they'll charge you for beyond a certain level of tokens, right?
And then there's like speed and quality.
The classic like lost in the middle evaluation where people talked about
as you increase the context that's utilized one million tokens of available context and that they'll let you send the api call does not mean that the llm will actually be able to reason through uh all that information so if you think about it like what information is even in an lom call to an agent you have the task description that the user gave you or like the user's messages
you have tools which need to be available to both the agent and the system prompt, as well as the tools that they then call throughout and their responses.
So one tool call to bash that says get me all syslogs might be 200,000 tokens, right?
you have the developer persona so all the information about their environment their role you know is there are we in a git repo of course code and all of the files that it's reading markdown code maybe it browsing it browses the web and retrieves information from the web um
Do you have historical context that might have occurred previously?
And you have like all of the other sort of artifacts that might pull in from system reminders.
So basically the LLM is seeing so much information about the task at hand in order to solve it.
And so over the course of a very long conversation,
It builds up.
And so in order to prevent this from crossing over a certain threshold, you have to figure out a way to compact or compress this information over time.
And so what we did was we initially had a naive solution that just compresses it and summarizes it and says, keep going.