Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

As PSM predicts, once the assistant's response begins compliantly, the LLM will impute that the assistant is most likely complying and generate a compliant continuation.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In other words, it's not that this prefix causes the LLM to stop enacting the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Rather, the LLM is still simulating the assistant but doing so badly.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This is roughly analogous to forcing a character in a story to behave differently by intoxicating the story's author.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Heading Consequences for AI Development In this section, we reflect on what PSM implies about safe AI development insofar as PSM is a good model of AI behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In the subsequent section, we discuss how exhaustive PSM is as a model of AI behavior, and therefore how relevant these implications are, as well as how we expect this to change in the future.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

AI assistants are human-like.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Our experience of AI assistants is that they are astonishingly human-like.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

By this we don't just mean that they use natural language.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Rather, we mean that their behaviors and apparent psychologies resemble those of humans.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

As discussed above, AI assistants express emotions and use anthropomorphic language to describe themselves.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

They at times appear frustrated or panicked and make the sorts of mistakes that frustrated or panicked humans make.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

More broadly, human concepts and human ways of thinking appear to be the native language in which AI assistants operate.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Anthropomorphic reasoning about AI assistants is productive.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM implies two subtly different reasons that it can be valid to reason anthropomorphically about AI assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

First, according to PSM, AI assistant behavior is governed by the traits of the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In order to simulate the assistant, the LLM must maintain a psychological model of it, including information about the assistant's personality traits, preferences, goals, desires, intentions, beliefs, etc.,

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Thus, even if we should not anthropomorphize LLMs, it is nevertheless reasonable to anthropomorphize the assistant, which is something like a character in an LLM-generated story.