Sam Marks

"The persona selection model" by Sam Marks

Here we discuss cases where AI assistants behave in non-human-like ways.

"The persona selection model" by Sam Marks

While these cases are, on their face, in tension with PSM, we overall think they have compelling PSM-compatible explanations.

2151.678 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Nevertheless, we think these case studies are useful for demonstrating what can and cannot be inferred from PSM.

2160.269 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Roughly speaking, we hypothesize that behaviors we discuss are caused by LLMs having limited capabilities or buggy behavior which distorts their rendition of the assistant.

2166.978 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That is, the LLM is trying to simulate the assistant, but its execution is limited by capabilities.

2176.67 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

LLMs sometimes make mistakes that are not very human-like, for example stating that 9.11 greater than 9.9

2184.368 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Despite generally having advanced mathematical capabilities, producing bizarre responses to altered versions of well-known riddles, see for example the altered riddles dataset for examples, or failing at simple character counting tasks like counting the RS in Strawberry.

2193.41 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

These UN human-like behaviors might appear to contradict PSM, which generally expects AI assistants to display human-like behavior.

2208.795 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

However, we hypothesize that these examples are better understood as arising from the limited capabilities of the underlying LLM.

2216.9 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Suppose that we observe a character in a story state that water boils at 50 degrees Celsius.