Steven Byrnes

👤 Speaker

266 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

But we should not expect a transformer forward pass to reproduce a full-fledged, entirely different, learning algorithm, with its own particular neural network architecture, its own particular methods of updating and querying weights, etc., as it runs and changes over millions of steps.

474.253 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Running one large-scale learning algorithm is expensive enough.

490.651 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

It's impractical to run a huge ensemble of different large-scale learning algorithms in parallel in order to zero in on the right one.

494.435 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

I'm going to harp on this because it's a point of confusion.

502.405 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

There are two learning algorithms under discussion.

505.99 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The imitation learning algorithm, for example a transformer getting updated by gradient descent on next-action prediction, and the target continual learning algorithm, for example a deep-q network getting updated by TD learning.

509.134 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

When the imitation learning is done, the transformer weights are frozen, and the corresponding trained model is given the impossible task of using only its activations, with fixed weights, to imitate what happens when the target continual learning algorithm changes its weights over millions of steps of, in this case, TD learning.

522.532 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

That's the part I'm skeptical of.

540.638 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

In other words...

543.242 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The only practical way to know what happens after millions of steps of some scaled-up continual learning algorithm is to actually do millions of steps of that same scaled-up continual learning algorithm, with actual weights getting actually changed in specifically designed ways via PyTorch code.

544.739 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

And then that's the scaled-up learning algorithm you're running.

559.859 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Which means you're not doing imitation learning.

564.069 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

So back to the human case.

567.116 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

For a typical person, call him Joe.

569.301 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

I think LLMs are good at imitating Joe today and good at imitating Joe plus one month of learning introductory category theory.

571.706 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

but can't imitate the process by which Joe grows and changes over that one month of learning.

578.742 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Or at least, can't imitate it in a way that would generalize to imitating a person spending years building a completely different field of knowledge that's not in the training data.

583.671 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Heading.

593.288 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Some things that are off-topic for this post.

594.59 View full episode →

LessWrong (Curated & Popular)

"You can’t imitation-learn how to continual-learn" by Steven Byrnes

As mentioned at the top, I'm hoping that this post is a narrow pedagogical point.

597.615 View full episode →

← Previous Page 4 of 14 Next →

Report any issue