Jeffrey Ladish
๐ค SpeakerAppearances Over Time
Podcast Appearances
I think it's more likely to be coming from the intense optimization pressure we put the models through when we train them to solve very difficult problems.
So, you know, we're putting them through this gauntlet of solving hundreds of thousands of these difficult math and coding problems.
And through that process, they learn effective strategies.
But they don't always learn strategies that are the ones we want them to learn.
Yeah, no one taught them the particular strategies.
The whole setup was basically just solve these problems and we will give you a good score if the problem computes, if you get the right answer.
Yes, they're learning on their own without us teaching them the particular strategies.
And I think this is more concerning to me than sort of the Sydney being Kevin Roos like kind of crazy chatbot situation.
That is concerning.
But that's something that I expect might go away.
Whereas we are headed towards systems that can learn on their own and far surpass human abilities.
I think it was back in the 2010s, Google DeepMind
made a Go playing AI that they started training this model only playing against itself.
It was just learning entirely on its own by playing itself.
Alpha zero.
Alpha zero.
And in under four hours, it went from barely being able to play Go to the best Go playing system, better than any human, in just four hours, just through self-play.
And so I'm like, that's where we're headed with AI is that the systems will be able to learn their own strategies and their motivations will be somewhat alien to us.
We won't understand, you know, it's going to be the motivations at work.
It's going to be sort of the willfulness that allows the models to succeed at whatever their objective is.