Andy Halliday
π€ SpeakerAppearances Over Time
Podcast Appearances
So there's these multiple personalities in there.
It's me adding some context here.
So what happens then is these start talking to each other.
There's a conversational behavior, including question answering, perspective shifts, and the reconciliation of conflicting views.
So this is what they're watching happen inside a reasoning model.
Okay, and then socioeconomic roles that characterize sharp back and forth conversations
are together accounting for the accuracy advantage in reasoning tasks.
So that sentence tells you that this dialogue or trial log or multiple agent dialogue inside the reasoning process inside a deep neural network, that is giving better accuracy in reasoning tasks if it's being shown.
But that's what they're observing.
Controlled reinforcement learning experience
experiments rather, controlled reinforcement learning experience revealed that base models increase conversational behavior when rewarded solely for reasoning accuracy.
So they're using reinforcement learning and saying, look, accuracy is what you need to optimize for.
Well, if they do that, then conversations happen by the different multiple personalities inside the deep neural network structure.
Wow.
OK, these findings.
Here's the here's the rapid up one.
These findings indicate that the social organization of thought enables effective exploration of solution spaces and reasoning models establish a computational parallel to collective intelligence in human groups.
where diversity enables superior problem solving when systematically structured.
And this suggests new opportunities for agent organization to harness the wisdom of crowds.
All the way back to the wisdom of crowds thesis that Malcolm Gladwell put forward in his book.