Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
surpassed everybody else in the International Math Olympiad.
So in pure math reasoning.
But it doesn't stop there.
So research problems don't come with clean answers.
So you have to go through proofs and refutations and all these other things.
And that's what this proof process is.
So it's not just coming up with the answer.
It's you've got to demonstrate the logic and the conclusions that you arrived at.
And so Google built a research agent called Aletheia.
and this is the first I've ever heard of it, on top of Google DeepThink.
And it generates proofs, checks them with a natural language verifier, revises weak steps in the process, and restarts if the logic fails.
And it will also come back and say, oh, I can't solve this problem.
But what it's done is it's gotten to 90% on the International Math Olympiad Proof Bench Advanced program.
So it's not just one benchmark that's being mastered.
This model, particularly Google DeepThink, is really kind of hitting all the marks when it comes to advanced scientific mathematics and logic research by AI.
I think that you're right.
It has to be giving people pause, people who have had the most advanced intelligence as demonstrated by their ability in mathematics particularly.
Because that's โ it's mind-boggling to me how โ I mean, I have trouble with arithmetic in my head.