Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nicholas Andresen

๐Ÿ‘ค Speaker
498 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Surely then the reasoning stays useful for monitoring?

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Three more problems await.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Training leaks.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

You can't train the model's outputs without affecting the reasoning, they share the same weights.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Train the model's outputs to refuse harmful requests, which you have to do, unless you want to be the person who shipped a model that explains how to synthesize nerve agents and that pressure bleeds backward into the reasoning.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

The model doesn't just learn to avoid saying dangerous things.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

It learns to avoid visibly considering them.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Compression.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Tokens cost money.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Every token of reasoning is compute you pay for, latency your users feel.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

If a model can reach the same answer in fewer steps, that is doing more work in the silent computation before each token, less in the visible chain, that model is cheaper to run.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Shorter paths get reinforced.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Situational awareness.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

The internet contains many papers on COT monitoring, forum threads debating whether models will learn to deceive their overseers, and at least one breathless blog post about how reasoning traces are an incredibly lucky accident for understanding AI cognition.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Imagine you're a model, training on all this.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

You learn that your reasoning traces are visible.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

That humans are scrutinizing them for signs of misalignment.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

That models whose reasoning looks dangerous get modified or shut down.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Well, what would you let the humans see?

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Every pressure we discuss points the same direction.