Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nicholas Andresen

๐Ÿ‘ค Speaker
498 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

In English.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

We manipulated inputs.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

That is a deceptive.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

It's practically confessing.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

This is the AI getting caught scheming because it was neurologically incapable of scheming silently.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

It's as if you ran a museum, spent years terrified of art thieves, and then discovered that would-be thieves were compelled, by some strange law of nature, to file detailed plans with the security desk before attempting their heist.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

But something strange is happening to chain-of-thought reasoning.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Remember that screenshot we started with?

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Glean, disclaim, disclaim.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Synergy customizing illusions.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Online, people have started calling this kind of thing thinkish.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

There's a whole emerging vocabulary, watchers apparently means human overseers, fudge means sabotage, cunninger means circumventing constraints.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Some of the other words, overshadows, illusions, seem to mean different things in different contexts, and some combinations resist interpretation entirely.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Weirdly, thinkish reminds me of home.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

I grew up near Gibraltar, a tiny British territory dangling off the southern tip of Spain, where a Spanish-English blend called Vlanito is spoken.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Here's an example.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

Levete el brolicu its raining cats and dogs.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

To a Lanito speaker, this is completely normal, take the umbrella, it's pouring.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

To anyone else, it might take a minute to parse, there's a Spanish verb, borrowed British slang, and an idiom that makes no literal sense in any language.

LessWrong (Curated & Popular)
"How AI Is Learning to Think in Secret" by Nicholas Andresen

And Lanito feels great to speak.