Nicholas Andresen

👤 Speaker

498 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

How long can we expect labs to pay that price for a property most customers can't see and don't know to ask for?

1893.187 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

This is a coordination problem.

1899.918 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

And solving it starts with measurement.

1902.503 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Measurement lets you argue about, and eventually agree on, what good looks like.

1905.328 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

That's a norm.

1910.396 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Norms can be formalized as standards.

1912.119 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Standards with teeth become regulation.

1914.972 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Coordination becomes much simpler when ship an unmonitorable model is what your lawyers call a very bad idea.

1918.027 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

But until recently, we didn't have real monitorability measurement.

1924.658 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Labs used chain-of-thought monitoring as part of safety work, but there was no systematic way to measure how well it actually worked or whether it was getting worse over time.

1928.983 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

What we had instead?

1938.593 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Occasional disclosures about COT practices and researchers like Ryan Greenblatt heroically pestering labs on social media, asking questions about those practices and noticing when disclosures mysteriously changed between releases.

1940.416 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Valuable, but not a measurement framework.

1953.97 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Then, in December, researchers at OpenAI published Monitoring Monitorability, a standardized evaluation suite that tested how well monitors catch bad behavior by reading chains of thought across several frontier models.

1957.314 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

The findings were mixed but somewhat encouraging.

1970.289 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

First, the bad news.

1973.834 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

When monitors can only see model actions without the chain of thought, like in a neuralese world, monitorability is pretty lousy.

1976.136 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

The paper's authors put it bluntly.

1983.637 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Unless monitoring improves dramatically, this casts doubts as to the potential to reliably use future AI systems safely in that regime.

1986.24 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

But with access to chains of thought, things look a bit better.

1994.709 View full episode →

← Previous Page 22 of 25 Next →

Report any issue