Dwarkesh Podcast

Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution

11 Jun 2024

1h 33m

17523 words

3 speakers

11 Jun 2024

Audio

Description

Here is my conversation with Francois Chollet and Mike Knoop on the $1 million ARC-AGI Prize they're launching today.I did a bunch of socratic grilling throughout, but Francois’s arguments about why LLMs won’t lead to AGI are very interesting and worth thinking through.It was really fun discussing/debating the cruxes. Enjoy!Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Timestamps(00:00:00) – The ARC benchmark(00:11:10) – Why LLMs struggle with ARC(00:19:00) – Skill vs intelligence(00:27:55) - Do we need “AGI” to automate most jobs?(00:48:28) – Future of AI progress: deep learning + program synthesis(01:00:40) – How Mike Knoop got nerd-sniped by ARC(01:08:37) – Million $ ARC Prize(01:10:33) – Resisting benchmark saturation(01:18:08) – ARC scores on frontier vs open source models(01:26:19) – Possible solutions to ARC Prize Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Featured

Dwarkesh Patel

Mike Knoop

Francois Chollet

Topics

Arc benchmark Francois Chollet Mike Knoop

Transcription

Full Episode

0.031 - 24.621 Dwarkesh Patel

Okay, today I have the pleasure to speak with Francois Chollet, who is a AI researcher at Google and creator of Keras. And he's launching a prize in collaboration with Mike Knouf, the co-founder of Zapier, who we'll also be talking to in a second, a million dollar prize to solve the Arc benchmark that he created. So first question, what is the Arc benchmark? And why do you even need this prize?

24.661 - 27.945 Dwarkesh Patel

Why won't the biggest LLM we have in a year be able to just saturate it?

27.925 - 52.819 Francois Chollet

Sure. So ARC is intended as a kind of IQ test for machine intelligence. And what makes it different from most LLM benchmarks out there is that it's designed to be resistant to memorization. So if you look at the way LLMs work, they're basically this big interpolative memory. And the way you scale up their capabilities is by trying to cram as much knowledge and patterns as possible into them.

52.799 - 74.655 Francois Chollet

And by contrast, ARC does not require a lot of knowledge at all. It's designed to only require what's known as core knowledge, which is basic knowledge about things like elementary physics, objectness, counting, that sort of thing. The sort of knowledge that any four-year-old or five-year-old possesses, right?

74.635 - 103.039 Francois Chollet

But what's interesting is that each puzzle in Arc is novel, is something that you've probably not encountered before, even if you've memorized the entire internet. And that's what makes Arc challenging for LMs. And so far, LMs have not been doing very well on it. In fact, the approaches that are working well are more towards discrete program search, program synthesis.

103.019 - 120.811 Dwarkesh Patel

First of all, I'll make a comment that I'm glad that as a skeptic of LLM, you have put out yourself a benchmark. Is it accurate to say that suppose that the biggest model we have in a year is able to get 80% on this? Then your view would be we are on track to AGI with LLMs. How would you think about that?

121.532 - 131.909 Francois Chollet

Right. I'm pretty skeptical that we're going to see LLM do 80% in a year. That said, if we do see it, you would also have to look at how this was achieved.

132.35 - 151.858 Francois Chollet

If you just train the model and millions or billions of puzzles similar to Arc, so that you're relying on the ability to have some overlap between the tasks that you train on and the tasks that you're going to see at test time, then you're still using memorization, right? And maybe it can work, you know, hopefully

151.838 - 168.002 Francois Chollet

Arc is going to be good enough that it's going to be resistant to this sort of attempt at brute forcing. But you never know. Maybe it could happen. I'm not saying it's not going to happen. Arc is not a perfect benchmark. Maybe it has flaws. Maybe it could be hacked in that way.

Dwarkesh Podcast

Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution

Full Episode

Sign in to Audioscrape

Share this moment

Dwarkesh Podcast

Francois Chollet, Mike Knoop - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution

Full Episode

Want to see the complete chapter?

Sign in to Audioscrape

Share this moment