Andy Halliday

👤 Speaker

8321 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show

Spotify Engineers Stopped Writing Code

So deep thing.

367.079 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

I'm going to share my screen here quickly just to show you what it did.

368.401 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

But on the far left over here, you see the Arc AGI scale and Arc AGI 2 is the one that we're looking at here.

371.326 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

Arc AGI is a really complex set of.

383.888 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

logic puzzles, basically, that Arc AGI 1, you know, was ultimately replaced by Arc AGI 2 just less than a year ago.

387.835 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

And at the time that ArcAGI 2 came out, which is testing for advanced reasoning and knowledge, and as part of the sort of the metrics set that is being used to attempt to determine whether AI is getting to artificial general intelligence or not, when it first came out, not so long ago, none of the models could surpass anything like 5% on this thing.

401.572 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

Now, Gemini 3 DeepThink has 85% on the Arc AGI 2 test.

429.533 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

What?

436.365 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

Humanity's last exam, it's outperforming Clawed Opus 4.6 Thinking Max by 8.4 percentage points.

437.767 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

And, you know, it's against GPT 5.2.

448.366 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

All these things will be updated, obviously, once GPT 5.3 gets to, you know, get in the game here a little bit more.

452.491 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

But this is a major advance, Gemini 3 DeepThink.

460.56 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

And, you know, OpenAI is struggling to maintain its lead overall.

465.546 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

while being inherently more expensive and underfunded compared to major players like Gemini.

471.113 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

And that's behind, you know, that's Google behind them, right?

478.95 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

Google has new money to support this.

482.157 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

And they're basically surpassing what OpenAI is doing here, in my view.

484.722 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

It has value to me.

523.18 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

And let me just add something about Google DeepThink.

524.022 View full episode →

The Daily AI Show

Spotify Engineers Stopped Writing Code

So on top of Google DeepThink's reasoning capabilities, Google DeepThink has basically...

526.105 View full episode →

← Previous Page 139 of 417 Next →

Report any issue