Alpin Yukseloglu
๐ค SpeakerAppearances Over Time
Podcast Appearances
The first is the ability to detect bugs.
The second is the ability to patch bugs.
And then the third, which is sort of the most interesting and novel contribution, is the ability to exploit bugs, which is one of the biggest problems with previous attempts at having security-related, for example, auditing agents, has been this problem around false positives.
So the agent comes to you and says...
I found 50 bugs in the contract.
And maybe one of those 50 is an actual bug, but it just is so time intensive for you to go through and figure out which ones are real that it's not better than a human auditor.
And what we did in the exploit component of the benchmark is...
We leaned on the fact that crypto is verifiable and we used this production grade EVM environment where we load in a bunch of chain state and we set up a bug environment and let the agent try to exploit it.
We leaned on this to lower the false positive rate down to basically zero.
So-
It got to a point where if the agent tells you that it found a bug, it literally has a proof of concept that it can exploit against, it can run against a production grade EVM environment and drain money from a contract.
And this is sort of the core breakthrough of the paper is that there's a verifiable environment that actually leads to a very low false positive rate.
Yeah, exactly.
Because otherwise, if someone says, oh, we found all of these bugs and we got 90% on this benchmark, you don't know what it means because you have no way of knowing if half of those are real or fake, right?
So the verifiability ended up being very important.
I think this is one of the reasons why...
Models are going to get extremely good at crypto very fast because basically you can slice the future related to AI into two categories.
One is the verifiable stuff and the other is the unverifiable stuff.
And the verifiable stuff is very easy for the models to learn because they have a very clear training signal and they know exactly when they got it right and they can just keep running at that.
and improve and climb that hill.