Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I guess my high-level formula would be look at the places where adoption has penetrated the most and start to measure speed up in actual output variables.
I think it would be really cool if there was a solar panel manufacturing plant that had really adopted AI and we started to see how much more quickly they could manufacture solar panels or how much better they could make solar panels.
I think the most important thing, or the thing I ultimately care about, is the AI stack.
So chip design, chip manufacturing, manufacturing the equipment that manufactures chips, and then, of course, the software piece of it, too.
The software piece is the earliest piece.
But I think we should be monitoring degree of AI adoption, self-reported AI acceleration, RCTs, anything we can get our hands on for the entire stack.
Because I think the...
moment when the sort of AI futurists think things are likely to be going much, much faster sort of coincides with when AI has like fully automated the process of making more AI.
So that's really something to watch out for.
And then I think like, but on a separate track, you also wanna just be looking at the earliest power users, no matter where they are, just because you can get insight that transfers to these domains.
I don't know.
I'm really like curious about this.
Yeah, so I put out a pair of requests for proposals in late 2023.
One of them was on building difficult, realistic benchmarks for AI agents.
So at the time, very few people were working with AI agents and only like a couple of agentic benchmarks had come out, including meters benchmark that I discussed on the show last time.
And so I was really excited about it.
It felt like it was a moment.
to move on from like giving LLMs multiple choice tests to giving them like real tasks like book me a flight or like, you know, make this piece of software work, like write tests, run the tests, iterate until the thing actually works.
And that was like a very new idea at the time, but also the time was sort of right for that idea and there were a lot of academic researchers who were excited about moving into the space.
So we got a lot of applications for that arm of our request for proposals, and we funded a bunch of cool benchmarks, including SciBench, which is a cyber offense benchmark that's used in a lot of standard evaluations now.