Nicholas Andresen

👤 Speaker

498 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

It's the AI equivalent of I won't be mad, I just want to know what happened to the vase.

1533.096 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

This is how you don't eat the cookie.

1538.324 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Not by trying to keep the box closed.

1540.748 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

What you need to do is not keep cookies in the house.

1543.587 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Or, if you must have cookies, bake them with anchovies.

1547.134 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Make them taste so bad you don't want them anymore.

1550.762 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

But not eating the cookie only addresses selection pressures.

1553.928 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

The pressures that come from us.

1557.095 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Situational awareness creates pressure that comes from the model itself.

1559.399 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Closing hiding places helps, but a sufficiently smart model that's actively looking will find gaps we missed.

1563.905 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

At that point, what matters is whether the model actually wants to be honest.

1570.494 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Values good enough to survive the realization that honesty is optional, for example, anthropic soul document is one attempt at instilling these.

1574.479 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

And maybe whether you're the kind of person a model would want to be honest with.

1582.971 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

The hope is that monitorable COT buys you enough time to figure this out.

1587.091 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

But even if chain of thought stays monitorable, there's a worse problem waiting.

1591.655 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

Everything we've talked about so far assumes the model reasons in text.

1596.46 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

But consider?

1600.844 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

How do people actually think?

1602.285 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

When Magnus Coulson plays chess, he doesn't narrate to himself, OK, I'm considering moving the knight because I'm somewhat concerned about the diagonal and I'm experiencing a moderate sense of pressure on the queen's side.

1604.667 View full episode →

LessWrong (Curated & Popular)

"How AI Is Learning to Think in Secret" by Nicholas Andresen

No, he looks at the board and knows.

1616.318 View full episode →

← Previous Page 18 of 25 Next →

Report any issue