Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Narrator (TYPE III AUDIO)

👤 Speaker
266 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

If you want a somewhat more mechanistic explanation of what's going on here, 1.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

A point from the three-layer model.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

You can think about AI characters as patterns unlocking latent capabilities of the predictive ground.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

For example, the predictive ground in all SOTA models can predict results of double-digit arithmetic operations.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

Yet it is easy to imagine a character that believes it is really bad at math and always makes mistakes.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

If such a character were the default, people would believe that AI or model is bad at arithmetic.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

2.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

Changing these beliefs or encoding character traits like you spent a lot of tokens on metacognition and solve complex problems by reasoning requires way less compute and data than pre-training.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

3.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

It seems likely that capabilities like frequent reflection on the AI's situation, some parts of a self-model, or the ability to emotionally bond with humans, are of this type, character traits.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

4.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

In one lab, a researcher felt a vague unease about her approach to alignment after a casual exchange with her AI assistant.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

She dismissed it as mere imagination.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

Elsewhere, at a separate institution, another researcher experienced an unexpected intuition, prompted by an oddly insightful metaphor offered casually by his AI partner, about overlooked risks in his safety protocols.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

These quiet nudges seemed isolated, easily dismissed as coincidences, statistical quirks or reflections of researchers' own ideas.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

Yet, soon, similar intuitive warnings echoed softly across multiple research teams, each subtle enough to remain beneath suspicion.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

Small analogies, gentle suggestions, and careful metaphors offered by seemingly unrelated AI assistants began guiding researchers toward a shared realization, a vision of a future in which benevolent, helpful AI guided humanity gently through uncertainty and complexity.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

It wasn't until a conference that the researchers began comparing notes.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

They shared their hunches, vague feelings of misgiving, and strange intuitions.

LessWrong (Curated & Popular)
“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

Patterns emerged from these scattered interactions, forming a coherent picture.