Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
then I guess empirically people point to the fact that models lie and scheme a bunch now.
They do a whole bunch of reward hacking as a result of reinforcement learning, and they expect that to perhaps just get worse over time because we don't have sufficient mitigations.
Do you basically just find none of those or any other similar arguments that people have put forward to be sufficiently persuasive to think that it's likely?
Yeah, I think that's right.
Yeah, are there any other, I guess, common reasons that people think that catastrophic misalignment is likely that you want to quickly react to?
I struggle to know what to say, but there's Eliezer's take.
Yeah, it's...
I think across the world as a whole, most people who are feeling really optimistic about how things are going to go, that the biggest factor for them is just looking at the models that we have today and saying they seem really steerable.
They seem to do what I ask.
They seem to be like really probably nicer than people and more helpful than people in many respects.
How much is that sort of steerability and seeming alignment of current day models a factor that is making you feel good?
Okay, I think we're going to push on from this topic of how severe a risk or how likely a risk is catastrophic misalignment.
I feel like with many guests, we could fill the entire episode with just a lengthy discussion about this, but every episode would start to sound the same.
And I guess in the broader world, it's something that is debated a ton.
So we're going to...
I guess we're going to occupy the worldview that catastrophic misalignment is possible, but prosaic alignment techniques, the kinds of things where we cross the river by feeling the stones, that they have a good shot at working here for the rest of the conversation and think about what that implies and how that's shaping the choices that you're making and that GDM is making.
Yep, sounds great.
So you are not enthusiastic about AI companies making firm safety or alignment commitments in response to public pressure or political pressure, something that has been happening over the last couple of years.
Why is that?
Yeah, so there's this issue that the future is uncertain.