Azeem Azhar
๐ค SpeakerAppearances Over Time
Podcast Appearances
So on the frontier math test, which GPT-4 used to score 2% on, O3 got 25%.
And I think that this is clearly what we are seeing.
And if you're using the tools, you have noticed that they've got better.
Although, as people also know, they're really, really quite unstable.
Something that happened with our team was that we had standardized on Gemini Flash 2.0 for some of our internal workflows.
It was quick, it was fast, it was bright enough.
I messed around and I moved to 2.5.
Flash in one case and 2.5 Pro, which is a more thinky model.
And in fact, the system started to break because those models overthought the problem they were given.
And I had designed the bits of the process on either side for that kind of output you get from Flash 2.0.
So hidden in this idea of kind of consistent progress is also this idea of
instability and change costs.
The second thing was the idea of warp speed deployment.
And I said that AI use would skyrocket and token prices would collapse, that AI agents would become more common, that Gen AI startups would exceed $100 million in revenue.
And I explicitly said, outside of the main foundation models.
And this has clearly, clearly been the case, the warp speed deployment.
Sundar Pichai went off and said that Google saw a 50-fold increase in demand for tokens across its various product surfaces in the year.
Microsoft, which had a head start, was reporting a five-fold increase.
We did modeling on the Chinese market and we're seeing a 30 to 40-fold increase in a 12-month period.
So