Francois Chollet
👤 SpeakerAppearances Over Time
Podcast Appearances
And you don't want them to be hard-coded like what we saw for the typical ArcDS.
You want them to be learned from examples.
But then you also want to do some degree of deep search.
As long as you're only doing very shallow search, you are limited to local generalization.
If you want to generalize further,
More broadly, this depth of search is going to be critical.
There's a forcing function of efficiency in the eval.
But here's the thing.
You only have 100 test tasks.
So the amount of compute you're available for each task is actually quite a bit, especially if you contrast that with the simplicity of each task.
solving Arc.
Absolutely.
So in the context of the competition, we want to see how much progress we can do with limited resources.
But you're entirely right that it's a super interesting open question.
What could the biggest model out there actually do on Arc?
So we want to actually also make available a private sort of like one-off track where you can submit to us a VM and so you can put on it
any model you want.
Like, you can take one of the largest open source models out there, fine tune it, do whatever you want, and just give us an image.
And then we run it on an H100 for like 24 hours or something, and you see what you get.
And this is actually what people are already doing.