Nathaniel Whittemore
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
As evidence, they point to a chart of Claude Code's session success rate, where across trivial tasks, routine tasks, substantial tasks, and open-ended problems, the success rate of all of those has climbed well above 60%, and for the trivial, routine, and substantial tasks, well above 80%, from a much lower place less than a year ago.
They also note that the mode of how Claude interacts with the codebase is changing.
Claude, they write, is getting better at proposing its own experiments.
They point to research that was published in April of this year that was exploring whether a weaker AI could manage a stronger AI.
The evidence suggests that the human role is narrowing at each step in the AI development process.
Once human and AI-authored code quality reach parity, humans will stop writing code entirely and shift to only reviewing it.
But if they can't review code as quickly as Claude can generate it, human review will become the bottleneck to AI development.
Similarly, once Claude can run experiments, the question shifts towards which of those experiments is worth running.
Put simply, the doing, writing the code, running the experiment, producing the result, now costs almost nothing in human time, even if it still has costs in compute.
An area of human comparative advantage for now is research, taste, and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end.
Indeed, they continue, "...the work that is still in human hands, choosing which problems to work on, is what matters most."
Without that judgment, Claude is a capable assistant, but not as a system that could drive AI progress on its own.
They write, it's genuinely unclear whether today's training methods and architectures could unlock that capacity.
But even if we suppose that Claude never achieves good research taste, a conservative reading of our evidence still implies compounding acceleration.
Now the meat of the piece, and the part that's generating the most discussion, is the last section on three possible futures.
The first possible future, which they say they include for completeness but don't believe it's likely, is one in which the trend stalls but today's AI capabilities are widely diffused.
They write, This article features many exponential trajectories, but these trajectories may actually turn out to be S-curves.
We may be approaching the bend in the curve, where returns to scale diminish and the line straightens then flattens.
The judgment that separates a competent researcher from a great one might be a capability that cannot come from scaling up training inputs like compute and data.
If so, getting past this bottleneck would require a new idea, like an architectural approach that supplants the transformer architecture that all current frontier models use.