Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
I'm like learning about my clients, my company, all this information.
Yeah.
Yeah.
So...
The question I'm trying to ask is, you need some way of getting, like, how many bits per second are you picking up?
Like, is a human picking up when they're, you know, out in the world, right?
If you're just, like, interacting over Slack with your clients and everything.
Yeah.
So what is the learning process which helps you capture that information?
Yeah.
One of my friends, Toby Ward, pointed out that if you look at the Muse Euro models that Google DeepMind deployed to learn Atari games, that these models were initially not a general intelligence itself, but a general framework for training specialized intelligences.
to play specific games.
That is to say that you couldn't, using that framework, train a policy to play both chess and Go and some other game.
You had to train each one in a specialized way.
And he was wondering whether that implies that reinforcement learning generally, because of this information constraint, you can only learn one thing at a time, the density of information isn't that high, or whether it was just specific to the way that MuZero was done.
And if it's specific to AlphaZero, what needed to be changed about that approach so that it could be a general learning agent?
So maybe it would be useful to explain what was missing in that architecture or that approach, which this continual learning AGI would have.
Yeah, I guess I'm curious about...
Historically, have we seen the level of transfer using RL techniques that would be needed to build this kind of... Okay, good, good.
Let me paraphrase to make sure that I understood that correctly.