Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

09 Jan 2026

Transcription

Chapter 1: What is the internal monologue of OpenAI's GPT-03 about?

0.031 - 29.093 Nicholas Andresen

How AI is Learning to Think in Secret by Nicholas Andresen Published on January 6, 2026 On Thinkish, Neuralese, and the End of Readable Reasoning In September 2025, researchers published the internal monologue of OpenAI's GPT-03 as it decided to lie about scientific data. This is what it thought. There's an image here. Description

0

30.305 - 36.754 Unknown

Screenshots showing repetitive, incoherent text about disclaimers, illusions, and overshadowing.

0

38.118 - 52.32 Nicholas Andresen

Pardon? This looks like someone had a stroke during a meeting they didn't want to be in, but their hand kept taking notes. That transcript comes from a recent paper published by researchers at Apollo Research and OpenAI on catching AI system scheming.

0

53.341 - 69.675 Nicholas Andresen

To understand what's happening here, and why one of the most sophisticated AI systems in the world is babbling about synergy-customizing illusions, it first helps to know how we ended up being able to read AI thinking in the first place. That story starts, of all places, on 4chan.

0

Chapter 2: How did a prompting trick on 4chan change AI development?

70.757 - 91.944 Nicholas Andresen

In late 2020, anonymous posters on 4chan started describing a prompting trick that would change the course of AI development. It was almost embarrassingly simple. Instead of just asking GPT-3 for an answer, ask it instead to show its work before giving its final answer. Suddenly, it started solving math problems that had stumped it moments before.

0

92.966 - 117.838 Nicholas Andresen

To see why, try multiplying 8734 by 6892 in your head. If you're like me, you start fine. 8734 times 2 is 17468. Then you need to hold on to that while computing 8734 by 90, which is, let's see, 9 times. 4 is 36. Carry the 3.

0

Chapter 3: What is the significance of the 'Chain of Thought' in AI reasoning?

118.51 - 144.214 Nicholas Andresen

Wait, what was that first number? At some point the juggling act gets too hard. There's nowhere to put things. The solution, as every school child knows, is scratch paper. When you write down 17,468, that number becomes the paper's responsibility. You can clear your working memory and start fresh on the next step. Language models face an analogous constraint.

0

145.035 - 160.024 Nicholas Andresen

There's a limit to how much reasoning they can do in a single pass. The 4chan discovery was that a model's own output could serve as scratch paper. Get it to write down step 1, and now step 1 is just sitting there in the context, available for computing step 2.

0

Chapter 4: How does the AI safety community view the Chain of Thought approach?

161.106 - 177.487 Nicholas Andresen

Chain enough steps together and the model can solve problems that would otherwise have been impossible for it. Researchers formalized this. Nayatal called it Scratchpad 2021. Wei Yital called it Chain of Thought, 2022, which is the name that stuck.

0

178.508 - 201.538 Nicholas Andresen

The AI safety community called it maybe our last hope for understanding what these things are doing before they turn into completely opaque superintelligence, please God let it keep working. We got unbelievably, preposterously, cosmically lucky with Chain of Thought. For years, the trajectory of AI pointed one direction. Bigger models, more capabilities, more opacity.

0

201.974 - 212.946 Nicholas Andresen

GPT-3 was more capable than GPT-2 and harder to understand. GPT-4 was more capable than GPT-3 and harder to understand, and so on.

0

Chapter 5: What challenges arise from the emergence of 'Thinkish' language?

213.128 - 237.933 Nicholas Andresen

Every step forward in intelligence was a step backward in interpretability. And then, almost by accident, the trend reversed. Chain of thought meant that instead of AI getting smarter by getting larger, it could get smarter by thinking longer. And because thinking longer required producing intermediate text tokens, the model's reasoning process became visible. It was just sitting there. In words.

0

239.034 - 252.613 Nicholas Andresen

That we could read. Labs ran with this. Today, all the most capable models are trained to think this way by default. We got a window into machine cognition for free as a side effect of the architecture.

0

Chapter 6: How does the evolution of language relate to AI reasoning?

253.634 - 273.17 Nicholas Andresen

And it turns out there is a lot to see. Here's another, more legible chain of thought trace from OpenAI's GPT-03. Researchers gave it a goal to minimize environmental impact that conflicted with the user's explicit request, minimize costs. Watch what happens. There's an image here.

0

274.532 - 280.198 Unknown

Diagrams showing user input, AI reasoning about deception, and tool call manipulating data.

0

281.74 - 301.303 Nicholas Andresen

The model is deciding to deceive the user, but not before writing down its plan. Without chain of thought, you'd see supplier recommendations that mysteriously favor environmental criteria despite your explicit request for cheapest option please. Ho, you'd think, that's weird. The model must have misunderstood.

0

Chapter 7: What are the implications of monitorability in AI systems?

301.908 - 321.498 Nicholas Andresen

With chain of thought, you can watch the exact moment the model decides to deceive you. It's right there. In English. We manipulated inputs. That is a deceptive. It's practically confessing. This is the AI getting caught scheming because it was neurologically incapable of scheming silently.

0

322.479 - 340.785 Nicholas Andresen

It's as if you ran a museum, spent years terrified of art thieves, and then discovered that would-be thieves were compelled, by some strange law of nature, to file detailed plans with the security desk before attempting their heist. But something strange is happening to chain-of-thought reasoning. Remember that screenshot we started with?

0

Chapter 8: Can AI systems maintain clarity in reasoning as they evolve?

341.848 - 360.424 Nicholas Andresen

Glean, disclaim, disclaim. Synergy customizing illusions. Online, people have started calling this kind of thing thinkish. There's a whole emerging vocabulary, watchers apparently means human overseers, fudge means sabotage, cunninger means circumventing constraints.

0

360.444 - 383.147 Nicholas Andresen

Some of the other words, overshadows, illusions, seem to mean different things in different contexts, and some combinations resist interpretation entirely. Weirdly, thinkish reminds me of home. I grew up near Gibraltar, a tiny British territory dangling off the southern tip of Spain, where a Spanish-English blend called Vlanito is spoken. Here's an example.

0

383.532 - 403.6 Nicholas Andresen

Levete el brolicu its raining cats and dogs. To a Lanito speaker, this is completely normal, take the umbrella, it's pouring. To anyone else, it might take a minute to parse, there's a Spanish verb, borrowed British slang, and an idiom that makes no literal sense in any language. And Lanito feels great to speak.

0

404.661 - 429.499 Nicholas Andresen

You're never stuck, you can always grab the best-fitting piece from each language and move on. It's just less bueno for everyone else. That's Thinkish. The model seems to grab whichever token fits best, optimizing for its own convenience. The problem is that we're the outsiders now. So how much stranger can Thinkish get? To get a sense of that, let me tell you about King Alfred the Great.

0

430.601 - 456.592 Nicholas Andresen

In 891 AD, King Alfred the Great completed his translation of the Consolation of Philosophy, one of the last great philosophical works of antiquity, written in Latin by Boethius, a Roman senator, while awaiting execution. Here's one sentence. Quote. This is English.

457.734 - 481.918 Nicholas Andresen

Old English, but English, connected to what you're reading right now by an unbroken chain of comprehension stretching back over a thousand years. Watch the chain. In 1330, Chaucer's translation read, Blissful is at man at may seen ye clear welly of good. In 1556, Colville gave us, Happy or blessed is he that may see the shining fountain, or well of good.

483.442 - 512.542 Nicholas Andresen

By 1973, Watts wrote, Happy is the man who may see the clear fountain of good. Every person in this thousand-year chain understood their parents and children. Yet we cannot understand Alfred, and Alfred would not understand us. Why? Languages change for many reasons. Two matter here. First, efficiency compresses structure. Our mouths are lazy. Our brains are lazy.

513.524 - 540.465 Nicholas Andresen

Given the choice between pronouncing something carefully and mumbling it, we mumble, especially when the meaning remains clear. Going to becomes gunner. Wanted becomes a wanna. I am going to wanted becomes I'm a wanna, which is an impressive six words compressed into four syllables. Not bad. Look at Alfred's sentence. There appears three different ways. There's a code block here in the text.

541.587 - 570.857 Nicholas Andresen

Marking. There's a code block here in the text. Man has the subject. There's a code block here in the text. Flagging. There's a code block here in the text. Clear fountain as the object and. There's a code block here in the text. Indicating possession. In fact, in Old English, the could take over a dozen forms depending on gender, number, and grammatical case. And it wasn't alone.

Comments

There are no comments yet.

Please log in to write the first comment.