Balaji Srinivasan
π€ SpeakerAppearances Over Time
Podcast Appearances
They can do more and longer, but a human can do and so on and so forth.
After I posted this, somebody actually, in a useful way on X, gave me feedback.
And there's a whole post that I'm going to put up which says this is actually a flawed chart that basically these β
This is not actually it's not actually a good measure of how long those things can take.
It's not a true benchmark.
And it's it's something where it's like what a human can do.
This thing can do in some amount of time.
a 50% accuracy rate and so on and so forth.
But they think, they argue it's a very tortured benchmark.
So I'll give that citation after.
So this one, big grain of salt, and it may actually write a post detailing that critique and giving it more distribution.
Nevertheless, a lot of people think this is happening, right?
And I think it's happening to some extent, maybe not quite as much as this.
Why is it happening to some extent?
Well, clearly you and I have used ClaudeCode work, we use ClaudeCode.
Clearly it can do more tasks for longer than it could in 2022, right?
Yes, that's right.
So there's the argument of this graph.
There's a counterargument that critiques the numbers in this graph and the benchmarks.
There's a counterargument, which is it definitely has been increasing, right?