Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
But then we also had this other arm, which was basically like types of evidence other than benchmarks, like surveys, RCTs, all the things we talked about.
We got much less interest for that.
And I think it just reflects that it's harder to think of like good ways to measure things outside of benchmarks, even though everyone agrees benchmarks have major weaknesses and consistently overestimate real world performance because benchmarks are sort of like,
Clean and contained and the real world is messy and open ended.
But one thing that I'm excited about that came out of the second RFP is that Forecasting Research Institute is running this panel called LEAP, which is the Longitudinal Experts on AI panel.
Where they just take like 100 or 200 AI experts, economists and super forecasters and have them answer a bunch of granular questions about where AI is going to be in the next six months, in the next year, in the next five years.
Both like benchmark scores, but also things like, you know, will companies report that they're like slowing down hiring because of AI?
Or like, will an AI be able to like plan an event in the real world or like these kinds of things?
So I'm very excited about that.
And I think, honestly, having people make subjective predictions, explain how those predictions are connected to their longer-run worldviews, and then check over time who's right might be the most flexible tool we have.
So I'm very excited to see where Leap goes.
But I think it is like...
It is challenging to get indicators that are clearly early warnings so that we can actually do something about it if the people who are more concerned are right, but that are also clearly valid and not easy to dismiss on the other side as just not realistic enough to matter.
Yeah.
And they may decide not to release those products.
Like if there's one company that's well ahead of the others, then like in AI 2027, it was sort of depicted that the company that was ahead in the AI race was so far ahead of its competitors that it could afford to just keep its best stuff internal and like only release sort of less good products to the rest of the world.
Like its competitors were far enough behind that they couldn't undercut it or compete with it by releasing a better product.
In the story, the company in the lead, Open Mind, is basically just like,
releasing products that are slightly better than the state-of-the-art of its competitors.
Yeah, so I think there's a whole spectrum of evidence about AI capabilities where on the one hand, the sort of easiest to test but the least informative is benchmark results.