Mike Knoop
👤 PersonAppearances Over Time
Podcast Appearances
And there's a $500,000 prize for the first team that can get to the 85% benchmark.
We're also going to run, we don't expect that to happen this year, actually.
One of the early statisticians at Zapier gave me this line that has always stuck with me, that the longer it takes, the longer it takes.
So my prior is that ARK is going to take years to solve.
And so we're going to keep, we're also going to break down and do a progress price this year.
So there's $100,000 progress price, which we will pay out to the top scores.
So $50,000 is going to go to the top objective scores this year on the Kaggle leaderboard, which is we're hosting it on Kaggle.
And then we're going to have a $50,000 pot set for a paper award for the best paper that explains conceptually the scores that they were able to achieve.
And one of the, I think, interesting things we're also going to be doing is we're going to be requiring that in order to win the prize money that you put the solution or your paper out into public domain.
The reason for this is, you know, tend to typically with contests, you see a lot of like closed up sharing.
People are kind of private secret.
They want to hold their outfit of themselves during the contest period.
And because we expect it's going to be multiple years, we want to enter a game here.
So the plan is, you know, at the end of November, we will award the $100,000 prize money to the top progress prize and then use the downtime between December, January, February to share out all the knowledge from the top scores and the approaches folks were taking in order to rebaseline the community up to whatever the state of the art is and then run the contest again next year and keep doing that on a yearly basis until we get 85%.
There's a 12-hour runtime limit, basically.
I think it's worth pointing out that there's two different test sets.
There is a public test set that's in the public GitHub repository that anyone can use to train, you know, put it in an open API call, whatever you'd like to do.
And then there's the private test set, which is the 100 that is actually measuring the state of the art.
So I think it is pretty open-ended and interesting to have folks attempt to at least use the public test set and go try it.
Now, there is an asterisk on any score that's reported on against the public test set.