Nathaniel Whittemore
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
That is the percentage of Claude's production code that is authored by Claude itself.
They also note that, as they put it, the code that Claude writes is good and improving.
Good code, they say, means two things.
It works and is written in a manner that allows another engineer to understand it and build upon it.
On the first criterion, they say, the evidence is clear.
The rate at which anthropic staff correct, redirect, or take over mid-task from Claude has been falling steadily for a year, including on the most complex and open-ended tasks.
This means problems with no clear specification, where the engineer isn't sure what the answer looks like.
As evidence, they point to a chart of Claude Code's session success rate, where across trivial tasks, routine tasks, substantial tasks, and open-ended problems, the success rate of all of those has climbed well above 60%, and for the trivial, routine, and substantial tasks, well above 80%, from a much lower place less than a year ago.
They also note that the mode of how Claude interacts with the codebase is changing.
Claude, they write, is getting better at proposing its own experiments.
They point to research that was published in April of this year that was exploring whether a weaker AI could manage a stronger AI.
The evidence suggests that the human role is narrowing at each step in the AI development process.
Once human and AI-authored code quality reach parity, humans will stop writing code entirely and shift to only reviewing it.
But if they can't review code as quickly as Claude can generate it, human review will become the bottleneck to AI development.
Similarly, once Claude can run experiments, the question shifts towards which of those experiments is worth running.
Put simply, the doing, writing the code, running the experiment, producing the result, now costs almost nothing in human time, even if it still has costs in compute.
An area of human comparative advantage for now is research, taste, and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end.
Indeed, they continue, "...the work that is still in human hands, choosing which problems to work on, is what matters most."
Without that judgment, Claude is a capable assistant, but not as a system that could drive AI progress on its own.
They write, it's genuinely unclear whether today's training methods and architectures could unlock that capacity.