Nicholas Andresen
๐ค SpeakerAppearances Over Time
Podcast Appearances
Today, position carries that signal instead.
Transmission fidelity is also why Gesellig took a thousand years to become silly.
Every incremental change had to preserve enough clarity for children to learn from parents, for merchants to trade, for lovers to court.
Chain of thought has no children.
No merchants.
No lovers.
Remember, chain of thought isn't communication.
It's computational scratch paper that happens to be in English because that's what the training data was.
Whether a human can follow the reasoning has no effect on whether it produces correct answers.
If anything, human readability is a handicap, English is full of redundancy and ambiguity that waste tokens.
The evolutionary pressure is toward whatever representation works best for the model, and human-readable English is nowhere near optimal.
That's pressure toward drift, and unlike with human languages, there's little pushing back.
Old English became unrecognizable despite maximum selection pressure for mutual comprehensibility, though it took a thousand years.
Remove that selection pressure and what?
Gradient descent can be very, very fast.
This is already happening.
OpenAI's GPT-03 drifted furthest into Thinkish, its COT nearly unreadable in places, though more recent releases, GPT-5, GPT-5.1, GPT-5.2, have gotten progressively more legible, suggesting deliberate correction.
DeepSeq's chains of thought sometimes code switch into Chinese mid-proof.
Anthropics and Geminis remain largely plain English.
For now.