Alex Svanevik
👤 SpeakerAppearances Over Time
Podcast Appearances
I guess, quality assurance or evaluation of how the agent is doing.
So we basically measure if we make changes to our agent, does it hallucinate more?
Like if it can make up stuff like, oh yeah, because you could go to, in theory, you could go to one of the other ones or one of these like not so good AI products in crypto and it just like makes something up.
So here's a token and literally it can make up the symbol.
It can make up some stats.
So we really...
religiously measure to what extent does it hallucinate and then you of course minimize the amount of hallucination and you don't release it unless it's like above a certain threshold in its ability to not hallucinate another example is if you have certain prompts that are execution focused like buy ten dollars worth of pengu does it manage to translate that to the actual order
the right token, the right amount, buy, not sell, all this stuff.
We measure that.
We measure how good is the formatting of the output.
Is it actually concise and clear?
Or is it way too long and convoluted?
And so we have a bunch of these different internal metrics that we use to measure the quality of the agent and make sure it's really, really good.
And then we also measure in the mobile app, you can thumbs up or thumbs down the response.
So we measure how much people actually like the answer.
That's not enough, but you do that as well.
And our score, there's 95% of the answers.
So people generally really like the responses.
So I'd say the first overall area is just quality, which relates to trust.
Because what is inevitably going to happen next year is you're going to have tons of agentic trading products.