Andy Halliday

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

And it's a curated main set of questions for model evaluation.

2738.675 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

So that's the GPQA.

2743.042 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

This is the stat is also just for GPQA, not the GPQA diamond, which is another higher level of the graduate level Google proof QA benchmark that has a little harder set of questions.

2744.904 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

But the comparison across Lama, Gemma, Quinn is going to be for that GPQA.

2760.488 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

Okay.

2766.176 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

So,

2766.497 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

Llama 3.2 does 17 on that benchmark, 17.

2767.531 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

Gemma does 24.

2776.304 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

Quen 3, with double the size, roughly, does 35.

2780.11 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

And LFM 2 at 1.2 billion, LFM 2.5 from liquid.ai does 39.

2785.057 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

OK, OK.

2794.919 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

But on your phone, this is a pretty big jump in my view.

2796.041 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

So now let's look at another another important test, which is NNLU Pro.

2799.426 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

So that is the massive multitask language understanding benchmark.

2805.435 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

And the pro version of that was created as a harder set of those.

2811.264 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

test questions in effect.

2816.772 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

It was, it was set up because most of the models, even the open source, smaller models were getting too good at NMLU original.

2819.536 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

So now NMLU pro.

2827.728 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

So how do these compare now?

2829.911 View full episode →

The Daily AI Show

Chrome Becomes the First Real Agentic Browser

Okay.

2831.534 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment