Francois Chollet
👤 PersonAppearances Over Time
Podcast Appearances
Sure.
So ARC is intended as a kind of IQ test for machine intelligence.
And what makes it different from most LLM benchmarks out there is that it's designed to be resistant to memorization.
So if you look at the way LLMs work, they're basically this big interpolative memory.
And the way you scale up their capabilities is by trying to cram as much knowledge and patterns as possible into them.
And by contrast, ARC does not require a lot of knowledge at all.
It's designed to only require what's known as core knowledge, which is basic knowledge about things like elementary physics, objectness, counting, that sort of thing.
The sort of knowledge that any four-year-old or five-year-old possesses, right?
But what's interesting is that each puzzle in Arc is novel, is something that you've probably not encountered before, even if you've memorized the entire internet.
And that's what makes Arc challenging for LMs.
And so far, LMs have not been doing very well on it.
In fact, the approaches that are working well are more towards discrete program search, program synthesis.
Right.
I'm pretty skeptical that we're going to see LLM do 80% in a year.
That said, if we do see it, you would also have to look at how this was achieved.
If you just train the model and millions or billions of puzzles similar to Arc, so that you're relying on the ability to have some overlap between the tasks that you train on and the tasks that you're going to see at test time, then you're still using memorization, right?
And maybe it can work, you know, hopefully
Arc is going to be good enough that it's going to be resistant to this sort of attempt at brute forcing.
But you never know.
Maybe it could happen.