Azeem Azhar
π€ SpeakerAppearances Over Time
Podcast Appearances
And so you end up in this quite odd space.
And let me give you that example.
If you've got an AI system that is unreliable, so say 10% of the time it makes errors, you're unlikely to put it into any kind of automated workflow.
You'll want to sit on top of it because one time in 10, it's going to make a muck up.
Now, when that error rate drops to 1%,
you'll feel much more confident about putting it into some kind of automated system, automated workflow, hundreds of times a minute, thousands of times an hour, tens of thousands of times a day.
But that 1% hallucination rate will show up time and again.
Or consider
a series of individual steps chained one to another.
Imagine you've got a process with 25 steps.
Well, a 1% hallucination rate means that each step succeeds 99 times out of 100, but across a chain of 25, it will mean one in five times that chain will fail.
And so, wow, we've got the hallucination rate down as OpenAI has with GPT-5, enough to say we don't need to attend to it all the time, but not so low that you can't let it run thousands of times without there being lots of problems or power a 25-step workflow through it.
And so you end up with this funny space, which is, of course, the technology is better than it was.
Of course, it's giving us more value for the $20 or $200 a month, but we really feel that gap.
So why does this all matter?
Shifting goalposts mean we redefine success as soon as it's achieved.
Negative space means every improvement makes what is still missing even more obvious.
Together, they guarantee that GPT-5 or GPT-6 or any new model from Anthropic or from Google or DeepSeek will probably feel less revolutionary than perhaps it really is.
GPT-5 didn't fail to impress us because it wasn't impressive.
it failed to impress us because we were never going to be impressed.