Corey Noles
π€ SpeakerAppearances Over Time
Podcast Appearances
So we can wrap this up here in a minute.
How about at the half hour mark?
Yeah, this is ArcAGI test.
So it's meant to test how well an agent can adapt to a new situation.
And we wrote about this in today's Neuron, but basically what happened was every frontier model tried this and was at less than 1% ability to complete it.
Yeah, because they're trying to test the actual underlying model and how good it is.
And perhaps that's not really a fair assessment if the way that we're actually going to be using these things in real life is with a harness and as a part of a system.
So I sort of agree with Corey's point there.
If you want to expand on that, you can.
We're so past the point.
Like, I'm going to give OpenAI a bit of grief tomorrow about their ads in ChatGPT.
I'm just kidding.
I'm going to give OpenAI grief tomorrow because their ads in ChatGPT are so generic.
I'm like, we have generative AI.
Like, you could make generative UI at this point, and you're going to give us a little tiny image and ad?
Like, I get it.
Like, we don't want the ads to be obtrusive.
But at the same time, like...
we're past the point where you should be putting out PDF documents.
Like, like you should be able to build an entire website with videos embedded with all of the tasks, like showing them, like, come on, like we're, we're way past the point of like PDF research reports.