Jaeden Schaefer

Google Launches Gemini 3.1 and YouTube AI

A lot of the leaderboards where people are like, look, we like basically anytime these AI companies can can test their own model.

314.24 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

on a benchmark it feels like they are cheating they're being scammy in some way and i mean i don't mean to call the kettle black but i feel like anthropic google and opening i have all been caught doing some form of this over the last few years so i don't really put as much stock in you know because basically those really screenshots where they're like we scored you know 72 on this exam and have a screenshot where it's like they they skipped out a couple of the questions that it

321.213 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

probably didn't do good on.

346.947 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

Anyways, I'm not saying this is Google, but there is an AI company that has done this.

348.169 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

And so when it comes to these companies testing themselves, I trust them a lot less than the real world leaderboards.

351.354 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

So some of those examples are when essentially they have side by side comparisons of their model versus another model.

357.483 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

And they just give have people give blind

363.632 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

They vote blindly on which response they like better.

365.915 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

And when a new model really starts crushing it on those type of leaderboards, I take stock in that because this is actual people blind testing saying that their model is better.

369.661 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

So that's great.

378.457 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

One of these kind of real world leaderboards comes from a company called Merkur, their CEO, who's Brendan Foody.

379.619 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

said he was posting about this.

386.671 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

He says that Gemini 3.1 Pro is now the number one company on, they have a leaderboard called the Apex Agents Leaderboard.

388.813 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

It's basically a benchmark that is designed to measure how well these AI systems handle professional knowledge-based tasks.

395.721 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

And he says that this is, I mean, basically just showing how quickly this can move into a lot of the systems that agents are using to improve real work.

402.569 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

So what's interesting to me is it feels like Google's putting a lot of stock in kind of this knowledge-based tasks field.

411.038 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

They're doing a lot with education.

418.253 View full episode →

AI in Business

Google Launches Gemini 3.1 and YouTube AI

And it seems like it's paying off in the benchmarks.