Francois Chollet
👤 SpeakerAppearances Over Time
Podcast Appearances
But even a five-year-old, so with very, very little knowledge, they could definitely do over 50%.
So we actually tried with average humans, the score about 85.
That's right.
yeah so that's uh there's a bunch of interesting points here so uh there is indeed uh um a branch of lm approaches suspended by jack call that are doing quite well that are in fact uh state of the art but you have to look at what's going on there so there are two things the first thing is that to get these numbers you need to pre-train your llm on millions of generated arc tasks
And of course, if you compare that to a five-year-old child looking at art for the first time, the child has never done an IQ test before, has never seen something like an art test before.
The only overlap between what they know and what they have to do in the test is core knowledge, is knowing about counting and objects and symmetries and things like that.
And still,
uh they're going to do really well and they're going to do much better than the llm trained on millions of similar tasks and the second thing that's that's something to note about the jack code approach is one thing that's really critical to making the model work at all is test time fine tuning and that's something that's really missing by the way from lm approaches
right now is that most of the time when you're using an LLM, it's just doing static inference.
The model is frozen, and you're just prompting it, and then you're getting an answer.
So the model is not actually learning anything on the fly.
Its state is not adapting.
to the task at hand.
And what Jack Core is actually doing is that for every test problem is on the fly, is fine-tuning a version of DLLM for that task.
And that's really what's unlocking performance.
If you don't do that, you get like 1%, 2%.
So basically something completely negligible.
And if you do test time fine-tuning and you add a bunch of tricks on top, then you end up with interesting performance numbers.
So I think what it's doing is trying to address one of the key limitations of LLMs today, which is the lack of active inference.
It's actually adding active inference to LLMs.