Jaeden Schafer
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
So some of those examples are.
are when essentially they have side-by-side comparisons of their model versus another model, and they just have people give blind, they vote blindly on which response they like better.
And when a new model really starts crushing it on those type of leaderboards, I take stock in that because this is actual people blind testing saying that their model is better.
One of these kind of real-world leaderboards comes from a company called Merkur.
Their CEO, who's Brendan Foody,
said he was posting about this.
He says that Gemini 3.1 Pro is now the number one company on, they have a leaderboard called the Apex Agents Leaderboard.
It's basically a benchmark that is designed to measure how well these AI systems handle professional knowledge-based tasks.
And he says that this is, I mean, basically just showing how quickly this can move into a lot of the systems that agents are using to improve real work.
So what's interesting to me is it feels like Google's putting a lot of stock in kind of this knowledge-based tasks field.
They're doing a lot with education and it seems like it's paying off in the benchmarks.
With this whole release, this is obviously really heating up the competition, opening eye, anthropic.
Everyone's rolling out new systems and it feels like they're only months apart.
The other exciting update from Google is that they are expanding where their AI shows up.
So on the consumer side, YouTube is bringing the latest Gemini AI assistant to smart TVs.
They're bringing it to their gaming consoles and they're bringing it to their streaming devices.
So I think previously this was like Gemini was basically just kind of on the mobile and on web interface.
And it was an experimental feature that they've now just added, which is basically letting viewers ask questions about what they're watching directly from their TV, which is honestly, it's kind of cool.
Obviously, you could see some of these features on maybe YouTube on your phone or on your computer.