Andy

And its core architectural idea is to move from generic next token prediction, which creates hallucinations, as we know, in LLMs, and instead use a stack that tightly couples a language modeling algorithm

429.158 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

sort of kernel with formal proof systems and programmatic reasoning from mathematics so

445.885 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

It's not trained on the broad web and conversational data, so it's not going to spin out and you can't kind of jailbreak it and have it talk about politics or anything.

453.631 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

It's a math-specific shell of formal mathematical languages, proof checkers, and verification-driven training signals that goes beyond standard LLMs.

464.826 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

So...

477.142 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

It doesn't have all those problems, and as a result, because each reasoning step is meant to be checked by a proof engine as it's running, it is virtually free of hallucinations that are common in generic LLM outputs.

478.336 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

So, David...

496.563 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

You know, check it out.

498.586 View full episode →

The Daily AI Show

The Problem With AI Benchmarks

Axiom Math.