Kwasi Ankomah
👤 PersonAppearances Over Time
Podcast Appearances
And this is because maybe we've now got more models that are able to reason.
So we want to put those models in as a, let's call them a planner kind of agent that might go to some sub-agent.
So that has become very token intensive.
So that's one thing.
The second thing is that because of that,
speed has become more important, right?
So the way that we actually talked about it is that if you just do one call to the LLM, so as we talked about inference, right?
So that one inference that you do, if that's slow, maybe that you're like, oh, you know, three to four seconds.
Okay, I'll take the hit, right?
And then it comes back from the model and I get my answer.
Now, because agents do so many calls, that basically
That delay gets exponential.
So, and we've seen this, right?
Like the applications I build, you know, we do 10 to 20 calls.
And if those 10 to 20 calls are each like three seconds, that gets suddenly you're sitting there waiting quite a long time for this.
And I can give you an example of like a coding agent, right?
Coding agent, say coding agent goes in, it takes the prompt and it thinks about something.
It says, okay, I'm going to give it to this planner agent.
That planner agent then goes to a sandbox who then runs the code and then speaks it out for analysis.
Yeah.