Steven Byrnes

👤 Speaker

266 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Now, suppose that I take a generic imitation learning algorithm, for example self-supervised learning in a transformer architecture neural net, just like LLM pre-training, and have it watch a DeepQ network play Atari breakout, as it starts from random initialization and gets better and better over 1M iterations.

303.79 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Okay, now we have our trained imitation learner.

322.242 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

We freeze its weights and use it in a similar way as people traditionally used LLM base models, that is have it output the most likely next move and then the most likely move after that, etc.

326.047 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Question.

337.563 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Is this trained imitation learner actually a good imitation of the DeepQ network?

338.885 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The actual deep cue network, right now, at the moment training is done, would output such and such breakout moves in such and such positions.

351.782 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Question.

360.478 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Will the trained imitation learner output similar moves right now, thus playing at a similar skill level as the teacher?

361.761 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Will the trained imitation learner likewise keep improving over the next 10M moves until it's doing things wildly better and different than anything that it saw its teacher's DeepQ network ever do?

379.855 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The actual DeepQ network, if it were suddenly transplanted into a new game environment, say, Atari Space Invaders, would start by making terrible moves, but over 10M iterations it would gradually improve to expert level.

397.688 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Will the trained imitation learner likewise do 10M iterations and then wind up performing expertly at this game, a game which it never saw during its training phase?

411.292 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Well, actually, for an ideal imitation learning algorithm, that is Solomonov induction on an imaginary hypercomputer, my answers would all be yes.

425.296 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

But in the real world, we don't have hypercomputers.

434.191 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

These days, when people talk about imitation learning, they're normally talking about transformers, not hypercomputers, and transformers are constrained to a much narrower hypothesis space.

438.34 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

There's a table here in the text.

449.483 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The heading row contains two columns, which read...

451.928 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Imitation learning a DeepQ RL agent by Solomon off induction.

455.335 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Imitation learning a DeepQ RL agent by training a transformer on next action prediction.

459.489 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

See the original text for the table content.

465.549 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

I think we should all be very impressed by the set of things that a transformer forward pass can do.

468.667 View full episode →

← Previous Page 3 of 14 Next →

Report any issue