Nicholas Andresen

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

You're not training on COT, you followed your policy.

1223.504 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

but you acted on what you saw.

1227.316 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

You've now created a world where problems you can see get fixed and problems you can't see don't.

1229.823 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Do that for long enough and what do you get?

1235.578 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Models whose problems don't show up in the reasoning trace.

1238.325 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

The same thing happens if the COT influences anything you do train on.

1241.947 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Suppose you show users summaries of the model's reasoning and let them rate which response they prefer.

1246.493 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Users like confidence and clarity.

1252.201 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

They don't like him.

1254.865 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Let me consider the unethical option before I explain why I'm rejecting it.

1255.866 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

That's unsettling.

1260.552 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

If the summary influences their rating, and why show it otherwise?

1262.475 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

You're selecting for reasoning that looks reassuring rather than reasoning that's informative.

1266.921 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

You ate the cookie.

1272.035 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

At this point you might try a new plan.

1273.841 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Total quarantine.

1276.409 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

The reasoning trace is sacred and untouchable.

1278.476 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Nobody trains on it.

1281.847 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Nobody acts on it.

1283.864 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

You hire a guy whose only job is to yell don't look at the reasoning but anyone who gets too curious.

1285.928 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment