Steven Byrnes
👤 SpeakerAppearances Over Time
Podcast Appearances
Now, suppose that I take a generic imitation learning algorithm, for example self-supervised learning in a transformer architecture neural net, just like LLM pre-training, and have it watch a DeepQ network play Atari breakout, as it starts from random initialization and gets better and better over 1M iterations.
Okay, now we have our trained imitation learner.
We freeze its weights and use it in a similar way as people traditionally used LLM base models, that is have it output the most likely next move and then the most likely move after that, etc.
Question.
Is this trained imitation learner actually a good imitation of the DeepQ network?
The actual deep cue network, right now, at the moment training is done, would output such and such breakout moves in such and such positions.
Question.
Will the trained imitation learner output similar moves right now, thus playing at a similar skill level as the teacher?
Will the trained imitation learner likewise keep improving over the next 10M moves until it's doing things wildly better and different than anything that it saw its teacher's DeepQ network ever do?
The actual DeepQ network, if it were suddenly transplanted into a new game environment, say, Atari Space Invaders, would start by making terrible moves, but over 10M iterations it would gradually improve to expert level.
Will the trained imitation learner likewise do 10M iterations and then wind up performing expertly at this game, a game which it never saw during its training phase?
Well, actually, for an ideal imitation learning algorithm, that is Solomonov induction on an imaginary hypercomputer, my answers would all be yes.
But in the real world, we don't have hypercomputers.
These days, when people talk about imitation learning, they're normally talking about transformers, not hypercomputers, and transformers are constrained to a much narrower hypothesis space.
There's a table here in the text.
The heading row contains two columns, which read...
Imitation learning a DeepQ RL agent by Solomon off induction.
Imitation learning a DeepQ RL agent by training a transformer on next action prediction.
See the original text for the table content.
I think we should all be very impressed by the set of things that a transformer forward pass can do.