Francois Chollet
👤 PersonAppearances Over Time
Podcast Appearances
And you see that, yeah, the LLM can encode just fine solution programs for tasks that it has seen before.
So it does not really have a problem passing the input or figuring out the program.
The reason why LLMs don't do well on Arc is really just
The unfamiliarity aspect, the fact that each new task is different from every other task.
Basically, you cannot memorize the solution programs in advance.
You have to synthesize a new solution program on the fly for each new task.
And that's really what LLMs are struggling with.
Like any smart human should be able to do 90%, 95% on Arc.
A smart human.
A smart human.
But even a five-year-old, so with very, very little knowledge, they could definitely do over 50%.
So we actually tried with average humans, the score about 85.
That's right.
yeah so that's uh there's a bunch of interesting points here so uh there is indeed uh um a branch of lm approaches suspended by jack call that are doing quite well that are in fact uh state of the art but you have to look at what's going on there so there are two things the first thing is that to get these numbers you need to pre-train your llm on millions of generated arc tasks
And of course, if you compare that to a five-year-old child looking at art for the first time, the child has never done an IQ test before, has never seen something like an art test before.
The only overlap between what they know and what they have to do in the test is core knowledge, is knowing about counting and objects and symmetries and things like that.
And still,
uh they're going to do really well and they're going to do much better than the llm trained on millions of similar tasks and the second thing that's that's something to note about the jack code approach is one thing that's really critical to making the model work at all is test time fine tuning and that's something that's really missing by the way from lm approaches
right now is that most of the time when you're using an LLM, it's just doing static inference.
The model is frozen, and you're just prompting it, and then you're getting an answer.