Cal Newport
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
All right.
So chart one,
Once we introduced these tools in late 2025 for software development, we began producing a lot more code using AI.
All right, let's go to the second major chart in this paper.
This is called Claude Code Session Success Rate.
What we see here is various color lines.
Each of these represents a different type
You could ask an AI LLM to solve trivial tasks, routine tasks, substantial tasks.
And then at the bottom, open-ended problems.
It starts in the fall of 2025.
And what we see is, especially with open-ended problems into 2026, and as the new models, Mythos and Cloud Opus 4.7 are introduced, the success rate with those jumps up from low 20% to somewhere between around 70%.
Okay, notice that graph starts, it just starts in the fall of 2025 because
and we'll get back to this in a second in more detail, but in order to ask AI to solve a hard coding problem, like the example they gave was, why is this thing, the system we programmed is crashing, why is it crashing?
And sending the AI to figure out why.
You can't actually have data points for before the fall of 2025 because that's when Anthropic released, along with OpenAI, their sort of first mature, what's called a coding harness.
So a control program that allows you to open
leverage in LLM plus a lot of hard-coded logic to actually tackle multi-step plans.
So you can't really have data for that from before because there's...
There's nothing to ask to try to solve a problem like that.
You couldn't just prompt the chat GPT to do that or Claude.