Mike Knoop
👤 PersonAppearances Over Time
Podcast Appearances
Because it is public, it could have leaked into the training dataset.
We want to evolve the Arc dataset.
That is a goal that we want to do.
I think, Francois, you mentioned it's not perfect.
guess that that will be interesting and maybe that'll indicate that you know you had to do something hacky with current models as they get better you won't have to do something hacky um i'm also going to be very curious to see how these multimodal models if they will perform natively much better at arc like tests if arc survives three months from here we'll blow up the price uh i think we're about to make a really important moment of like contact with reality by blowing up the prize putting a much big prize pool against it we're gonna learn really quickly if there's like low-hanging fruit of ideas
Again, I think new ideas are needed.
I think anyone listening to this might have the idea in their head.
And I'd encourage everyone to give it a try.
And I think as time goes on, that adds strength to the argument that we've sort of stalled out in progress and that new ideas are necessary to be dark.
It'd certainly be motivating.
We're going to keep running the contest until somebody puts a reproducible open source version into public domain.
So even if somebody privately beats the ARK eval, we're going to still keep the prize money until someone can reproduce it and put the public reproducible version out there.
We're making contact with reality as well with this, right?
We're going to learn a lot, I think, about what the actual limits of the compute were.
If someone showed up and said, hey, here's a closed source model that...
like I'm getting 50 plus percent on, I think that would probably update us on like, okay, perhaps we should increase the amount of compute that we give on the private test set in order to balance, you know, some of the decisions initially are somewhat arbitrary in order to learn about, okay, what's, what do people want?
What does progress look like?
And I think both of us are sort of committed to evolving it over time in order to be the best or the closest to perfect as we can get it.