Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
You can already try to prompt one of the best models, like the latest Gemini, the latest GPT-4, with tasks from the public evaluation set.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And again, the problem is that these tasks are available as JSON files on GitHub.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
These models are also trained on GitHub.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So they're actually trained on these tasks.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And yeah, that kind of creates uncertainty about if they can actually solve some of the tasks, is that because they memorized the answer or not?
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
You know, maybe you would be better off trying to create your own private arc-like, very novel test set.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Don't make the task difficult, don't make them complex, make them very obvious for humans, but make sure to make them original as much as possible, make them unique, different.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
and see how much your GPT-4 and so on, or GPT-5 does on them.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Yeah, no, Arc is not a perfect benchmark.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
I mean, I made it like four years ago, over four years ago, almost five now.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
This was in a time before LLAMS.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And I think we learned a lot, actually, since about what potential flaws there might be.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
I think there is some redundancy.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
in the set of tasks, which is, of course, against the goals of the benchmark.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Every task is supposed to be unique in practice.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
That's not quite true.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
I think there's also every task is supposed to be very novel, but in practice, they might not be.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
They might be structurally similar to something that you might find online somewhere.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So we want to keep iterating and release an Arc 2 version later this year.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And I think when we do that, we're going to want to make the old private test set available.