Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Steven Byrnes

👤 Speaker
266 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

But we should not expect a transformer forward pass to reproduce a full-fledged, entirely different, learning algorithm, with its own particular neural network architecture, its own particular methods of updating and querying weights, etc., as it runs and changes over millions of steps.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Running one large-scale learning algorithm is expensive enough.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

It's impractical to run a huge ensemble of different large-scale learning algorithms in parallel in order to zero in on the right one.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

I'm going to harp on this because it's a point of confusion.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

There are two learning algorithms under discussion.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The imitation learning algorithm, for example a transformer getting updated by gradient descent on next-action prediction, and the target continual learning algorithm, for example a deep-q network getting updated by TD learning.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

When the imitation learning is done, the transformer weights are frozen, and the corresponding trained model is given the impossible task of using only its activations, with fixed weights, to imitate what happens when the target continual learning algorithm changes its weights over millions of steps of, in this case, TD learning.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

That's the part I'm skeptical of.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

In other words...

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

The only practical way to know what happens after millions of steps of some scaled-up continual learning algorithm is to actually do millions of steps of that same scaled-up continual learning algorithm, with actual weights getting actually changed in specifically designed ways via PyTorch code.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

And then that's the scaled-up learning algorithm you're running.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Which means you're not doing imitation learning.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

So back to the human case.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

For a typical person, call him Joe.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

I think LLMs are good at imitating Joe today and good at imitating Joe plus one month of learning introductory category theory.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

but can't imitate the process by which Joe grows and changes over that one month of learning.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Or at least, can't imitate it in a way that would generalize to imitating a person spending years building a completely different field of knowledge that's not in the training data.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Heading.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

Some things that are off-topic for this post.

LessWrong (Curated & Popular)
"You can’t imitation-learn how to continual-learn" by Steven Byrnes

As mentioned at the top, I'm hoping that this post is a narrow pedagogical point.