Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

๐Ÿ‘ค Speaker
3893 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

So an 8 billion parameter model trained with that system surpassed GPT-5 and Clawed Opus 4.1 on Humanity's last exam.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

So that's a pretty good accomplishment.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

I'm not comparing it to GPT-5.1.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

or OPUS 4.5.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

This happens to have been done, the research was done before the release of those models.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

So who knows whether those models would be a little bit better on humanity's last exam, but it scored above GPT-5 and CLAWed OPUS 4.1 while being two and a half times faster and more efficient.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

Like the orchestration process does something better than large model inference.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

Okay, so even when they used unseen tools, tools that's not familiar with, the orchestrator adapted to that well, showing that it can work with changing tool sets and be kind of exploratory or discover new tools to use.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

And so...

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

Tool orchestra is this new approach.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

It's a new old approach.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

Like we've seen this emerging, I think, but of taking smaller dedicated models in a strapped architecture that puts them all together with a commander, right?

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

This orchestrator, the tool orchestrator.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

And when you do that, you get faster, more efficient and smarter, right?

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

than you do with even the then most advanced frontier models.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

I'm throwing up this comment from Gareth about Andre Karpathy just posted in this past several days about an LLM council that he created.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

And it's a similar notion, right?

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

You have a group of LLMs, smaller ones, although he might be using full scale, you know, frontier models as part of his council.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

But having them be able, setting them up in such a way that they can talk to each other and are directed to communicate with each other is really important.

The Daily AI Show
Deep Sea Strikes First and ChatGPT Turns 3

So, yeah.