Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
So like one thing I'm interested in is like what fraction of pull requests to your internal code base were mostly written by AI and mostly reviewed by AI.
So AI is like, humans are like not involved for the most part in like both sides of this equation.
And I'd be very interested in watching that number climb up, because I think it's an indication both of AI capabilities and of how much deference they're giving to AIs.
And eventually, if things are going to go crazy fast, the AIs have to be doing most things, including most management and approval and review.
Because if humans have to do that stuff, then things can only go so fast.
So I really want to track how much higher-level decision-making authority is being given to the AIs in practice inside the companies.
I think there are probably a bunch of other things that we could send basically as a survey.
How much do you use AIs for this type of thing, for that type of thing?
How much speedup do you get?
Subjectively, do you think you get?
If you're running any internal RCTs, I would, of course, love to know the results of that.
Yeah, I think...
That is a good thing to do, but I sort of don't think that just benchmarks alone will actually lead anyone to sound the alarm because we just, like the thing with benchmarks is that they saturate.
They always have the S-curve shape and the benchmarks we have right now are harder than the previous generation of benchmarks, but it's still far from the case that like,
I feel confident that if your AI gets 100% score on all these benchmarks, then it's like a threat to the world and it could take over the world.
I still think the benchmarks we have right now are like well below that.
So what's probably going to happen is that these benchmarks are going to get saturated.
Then there's going to be a next generation of benchmarks people make.
And then those benchmarks are going to tick up and then get saturated.
So I think we need some kind of real world measure before we can start sounding the alarm.