Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So maybe we won't be releasing it publicly, but what we could do is just create a test server
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
where you can query, get a task, you submit a solution, and of course, you can use whatever frontier model you want there.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So that way, because you actually have to query this API, you're making sure that no one is going to, by accident, train on this data.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
It's unlike the current public architecture, which is literally on GitHub.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So there's no question about whether the models are actually trained on it.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Yes, they are, because they're trained on GitHub.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So by sort of like gating access to acquiring this API, we would avoid this issue.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And then we would see, you know, for people who actually want to try whatever technique they have in mind using whatever resources they want, that would be a way for them to get an answer.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Yeah, that's the point of having a money prize is that you attract more people, you get them to try to solve it.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And if there's an easy way to hack the benchmark that reveals that the benchmark is flawed, then you're going to know about it.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
In fact, that was the point of the original
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
cable competition back in 2020 for Arc.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
I was running this competition because I had released this dataset and I wanted to know if it was hackable, if you could cheat.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So there was a small money prize at the time, that was like 20K, and this was right around the same time as GPT-3 was released.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
So people of course tried GPT-3 on the public data, it scored zero.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
But I think what the first contest taught us is that there is no obvious shortcut, right?
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And well, now there's more money, there's going to be more people looking into it.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
Well, we're going to find out.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
We're going to see if the benchmark is going to survive.
Dwarkesh Podcast
Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution
And you know, if we end up with a solution that is not like trying to brute force the space of possible arc tasks that's just trained on core knowledge, I don't think it's necessarily going to be in and by itself AGI, but it's probably going to be a huge milestone on the way to AGI.