Azeem Azhar
๐ค SpeakerAppearances Over Time
Podcast Appearances
So there were seven predictions, and I had some watch list themes around geopolitical volatility, demographic decline, and climate and capital, which I didn't have strong tests against.
But let's start with the first one, which was that there would be no AI wall.
And I said, look, research is accelerating, not plateauing.
And we would likely see a 10 million token context model
and reasoning breakthroughs across some of these reasoning benchmarks.
Now, both of those matter because the context window of a model is a little bit like its working memory.
It's a bit shonky as a working memory, but it's the bit that you put into your LLM and it can manipulate back and forth.
And when you get to the end of the context window, it tends to hallucinate much, much more.
And the reasoning model, the reasoning tests like Frontier and RKGI are very useful tests for whether a model is capable not just of producing text, but also reasoning through problems.
Now, in both cases, I asked ChatGPT03 to mark my work
And it said, you are accurate.
There's been no slowdown in AI progress, that ultra-long context models have emerged, and AI models are shattering reasoning benchmarks.
And that's true.
I mean, there is this 10 million token context window with Meta's Lama Force Scout model.
But I want to say that even though that model was released,
Meta had had all sorts of issues with its Lama models and there were problems at the time.
So Wildchat GPT has very gracefully given me that as the evidence point that I was right.
I don't think that's a best evidence point.
There are a couple of other models out there, Megadev, and I believe Gemini has one in testing, which are up at that 10 million level.
And at the same time, O3 itself has really, really pushed its benchmark scores.