Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

πŸ‘€ Speaker
8318 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

And it's a curated main set of questions for model evaluation.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

So that's the GPQA.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

This is the stat is also just for GPQA, not the GPQA diamond, which is another higher level of the graduate level Google proof QA benchmark that has a little harder set of questions.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

But the comparison across Lama, Gemma, Quinn is going to be for that GPQA.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

Okay.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

So,

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

Llama 3.2 does 17 on that benchmark, 17.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

Gemma does 24.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

Quen 3, with double the size, roughly, does 35.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

And LFM 2 at 1.2 billion, LFM 2.5 from liquid.ai does 39.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

OK, OK.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

But on your phone, this is a pretty big jump in my view.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

So now let's look at another another important test, which is NNLU Pro.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

So that is the massive multitask language understanding benchmark.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

And the pro version of that was created as a harder set of those.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

test questions in effect.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

It was, it was set up because most of the models, even the open source, smaller models were getting too good at NMLU original.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

So now NMLU pro.

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

So how do these compare now?

The Daily AI Show
Chrome Becomes the First Real Agentic Browser

Okay.