Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
of that broad idea?
I guess an obvious problem with that is you will successfully stop the model from doing scammy things that seem to accomplish the goal that I guess it would get reinforced for but actually don't accomplish the goal that you had in mind.
But on the other hand, you would also block it off from doing stuff that would have accomplished the goal that was actually a brilliant insight that you never would have had.
So I guess if you were training the Go model, AlphaGo,
if humans were evaluating whether the moves were good, then the model actually couldn't end up exceeding human performance because they would grade moves that were actually unexpectedly brilliant as bad at the early stage.
How do you get around that?
So you're saying the overseer could see that the hypothetical AI running the business made a lot of money, but they evaluate the process that it went through.
They can include that information, but they do it in light of also looking at the process as well?
Is this approach going to be useful for frontier models any time soon?
Could you see companies actually using it?
It would improve performance because you wouldn't get the reward hacking.
Okay, so that's an increase in efficiency that you get from this approach to RL that might allow it to remain competitive in terms of its raw performance with other, I guess, less myopic approaches.
In theory.
Okay, the second paper from GDM that didn't get a ton of attention is called An Approach to Technical AGI Safety and Security.
It was written by about 30 GDM staff members, or there's 30 bylines on it.
I guess it describes as far as... It's like a position paper, as far as I can tell, of broadly what does GDM think it's going to do as it develops AGI, which makes it...
DeepMind plausibly is the organization that is most likely to do this.
It makes it a bit surprising that people are not interested in this incredibly thorough description of what you think you are and aren't going to do and why.
It is quite long, but it has a nice, I guess, 10-minute long summary at the start that you could use to get an overview if people are interested.
So if you want to be ahead of the curve on understanding GDM's approach to developing AGI, then you could spend that 10 minutes doing that.