Narrator (TYPE III AUDIO)
👤 SpeakerAppearances Over Time
Podcast Appearances
If you want a somewhat more mechanistic explanation of what's going on here, 1.
A point from the three-layer model.
You can think about AI characters as patterns unlocking latent capabilities of the predictive ground.
For example, the predictive ground in all SOTA models can predict results of double-digit arithmetic operations.
Yet it is easy to imagine a character that believes it is really bad at math and always makes mistakes.
If such a character were the default, people would believe that AI or model is bad at arithmetic.
2.
Changing these beliefs or encoding character traits like you spent a lot of tokens on metacognition and solve complex problems by reasoning requires way less compute and data than pre-training.
3.
It seems likely that capabilities like frequent reflection on the AI's situation, some parts of a self-model, or the ability to emotionally bond with humans, are of this type, character traits.
4.
In one lab, a researcher felt a vague unease about her approach to alignment after a casual exchange with her AI assistant.
She dismissed it as mere imagination.
Elsewhere, at a separate institution, another researcher experienced an unexpected intuition, prompted by an oddly insightful metaphor offered casually by his AI partner, about overlooked risks in his safety protocols.
These quiet nudges seemed isolated, easily dismissed as coincidences, statistical quirks or reflections of researchers' own ideas.
Yet, soon, similar intuitive warnings echoed softly across multiple research teams, each subtle enough to remain beneath suspicion.
Small analogies, gentle suggestions, and careful metaphors offered by seemingly unrelated AI assistants began guiding researchers toward a shared realization, a vision of a future in which benevolent, helpful AI guided humanity gently through uncertainty and complexity.
It wasn't until a conference that the researchers began comparing notes.
They shared their hunches, vague feelings of misgiving, and strange intuitions.
Patterns emerged from these scattered interactions, forming a coherent picture.