Francois Chollet
👤 SpeakerAppearances Over Time
Podcast Appearances
So I think you want to add a code interpreter to the system.
I think that's great.
That's totally legitimate.
The part that would be cheating is try to...
anticipate what might be in the test set like brute force the space of possible tasks and then train a memorization system on it and then rely on the fact that you're generating so many tasks like millions and millions and millions that inevitably there's going to be some overlap between what you're generating and what's in the test set.
I think that's defeating the purpose of benchmark because then you can just solve it without any need to adapt just by fetching a memorized solution.
So hopefully Arc will resist to that, but you know, nothing, no benchmark is necessarily perfect.
So maybe there's a way to hack it.
And I guess we are going to get an answer very soon.
You want to input core knowledge, like arc-like core knowledge into the model, but surely you don't need tens of millions of tasks to do this.
Like core knowledge is extremely basic.
I would definitely file that under core knowledge.
Like core knowledge includes basic physics, for instance, bouncing or trajectories.
That would be included.
But yeah, I think you're entirely right.
The reason why as a human you're able to quickly figure out the solution is because you have this set of building blocks, this set of patterns in your mind that you can recombine.
Core knowledge can be learned.
And I think in the case of humans, some amount of core knowledge is something that you're born with.
Like we're actually born with a small amount of knowledge about the world we're going to live in.
We are not blank slates.