Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Steven Byrnes

👤 Speaker
266 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Now, suppose that I take a generic imitation learning algorithm, for example self-supervised learning in a transformer architecture neural net, just like LLM pre-training, and have it watch a DeepQ network play Atari breakout, as it starts from random initialization and gets better and better over 1M iterations.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Okay, now we have our trained imitation learner.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

We freeze its weights and use it in a similar way as people traditionally used LLM base models, that is have it output the most likely next move and then the most likely move after that, etc.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Question.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Is this trained imitation learner actually a good imitation of the DeepQ network?

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The actual deep cue network, right now, at the moment training is done, would output such and such breakout moves in such and such positions.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Question.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Will the trained imitation learner output similar moves right now, thus playing at a similar skill level as the teacher?

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Will the trained imitation learner likewise keep improving over the next 10M moves until it's doing things wildly better and different than anything that it saw its teacher's DeepQ network ever do?

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The actual DeepQ network, if it were suddenly transplanted into a new game environment, say, Atari Space Invaders, would start by making terrible moves, but over 10M iterations it would gradually improve to expert level.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Will the trained imitation learner likewise do 10M iterations and then wind up performing expertly at this game, a game which it never saw during its training phase?

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Well, actually, for an ideal imitation learning algorithm, that is Solomonov induction on an imaginary hypercomputer, my answers would all be yes.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

But in the real world, we don't have hypercomputers.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

These days, when people talk about imitation learning, they're normally talking about transformers, not hypercomputers, and transformers are constrained to a much narrower hypothesis space.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

There's a table here in the text.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The heading row contains two columns, which read...

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Imitation learning a DeepQ RL agent by Solomon off induction.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Imitation learning a DeepQ RL agent by training a transformer on next action prediction.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

See the original text for the table content.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

I think we should all be very impressed by the set of things that a transformer forward pass can do.