Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
22012 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
The Problem With AI Benchmarks

AI models, and specifically large language models, are used to do spreadsheets, and now they're becoming more and more competent in doing those kinds of financial analysis-type things.

The Daily AI Show
The Problem With AI Benchmarks

But when you get to the realm of pure mathematics, where mathematicians are working on proofs and developing new theorems in sort of the ethereal world of mathematics, LLMs are not very impressive.

The Daily AI Show
The Problem With AI Benchmarks

And one of the world's biggest mathematicians, a fellow named Joel David Hamkins, has slammed AI models used for solving mathematics and calls them zero and garbage, adding he doesn't find them useful at all.

The Daily AI Show
The Problem With AI Benchmarks

He highlighted AI's frustrating tendency to confidently assert incorrect conclusions and resist correction.

The Daily AI Show
The Problem With AI Benchmarks

They'll argue with him.

The Daily AI Show
The Problem With AI Benchmarks

And he said, quote, if I were having such an experience with a person, I would simply refuse to talk to that person again.

The Daily AI Show
The Problem With AI Benchmarks

Now, bring on Axiom Math.

The Daily AI Show
The Problem With AI Benchmarks

Founded by 24-year-old dropout from Stanford, Karina Hong, it raised a $64 million seed round.

The Daily AI Show
The Problem With AI Benchmarks

Build an AI mathematician.

The Daily AI Show
The Problem With AI Benchmarks

And major investors are behind this, including Graycroft and Menlo Ventures, a couple of VC firms whose names I recognize.

The Daily AI Show
The Problem With AI Benchmarks

And its core architectural idea is to move from generic next token prediction, which creates hallucinations, as we know, in LLMs, and instead use a stack that tightly couples a language modeling algorithm

The Daily AI Show
The Problem With AI Benchmarks

sort of kernel with formal proof systems and programmatic reasoning from mathematics so

The Daily AI Show
The Problem With AI Benchmarks

It's not trained on the broad web and conversational data, so it's not going to spin out and you can't kind of jailbreak it and have it talk about politics or anything.

The Daily AI Show
The Problem With AI Benchmarks

It's a math-specific shell of formal mathematical languages, proof checkers, and verification-driven training signals that goes beyond standard LLMs.

The Daily AI Show
The Problem With AI Benchmarks

It doesn't have all those problems, and as a result, because each reasoning step is meant to be checked by a proof engine as it's running, it is virtually free of hallucinations that are common in generic LLM outputs.

The Daily AI Show
The Problem With AI Benchmarks

So, David...

The Daily AI Show
The Problem With AI Benchmarks

You know, check it out.

The Daily AI Show
The Problem With AI Benchmarks

Axiom Math.