Steven Byrnes
👤 SpeakerAppearances Over Time
Podcast Appearances
But we should not expect a transformer forward pass to reproduce a full-fledged, entirely different, learning algorithm, with its own particular neural network architecture, its own particular methods of updating and querying weights, etc., as it runs and changes over millions of steps.
Running one large-scale learning algorithm is expensive enough.
It's impractical to run a huge ensemble of different large-scale learning algorithms in parallel in order to zero in on the right one.
I'm going to harp on this because it's a point of confusion.
There are two learning algorithms under discussion.
The imitation learning algorithm, for example a transformer getting updated by gradient descent on next-action prediction, and the target continual learning algorithm, for example a deep-q network getting updated by TD learning.
When the imitation learning is done, the transformer weights are frozen, and the corresponding trained model is given the impossible task of using only its activations, with fixed weights, to imitate what happens when the target continual learning algorithm changes its weights over millions of steps of, in this case, TD learning.
That's the part I'm skeptical of.
In other words...
The only practical way to know what happens after millions of steps of some scaled-up continual learning algorithm is to actually do millions of steps of that same scaled-up continual learning algorithm, with actual weights getting actually changed in specifically designed ways via PyTorch code.
And then that's the scaled-up learning algorithm you're running.
Which means you're not doing imitation learning.
So back to the human case.
For a typical person, call him Joe.
I think LLMs are good at imitating Joe today and good at imitating Joe plus one month of learning introductory category theory.
but can't imitate the process by which Joe grows and changes over that one month of learning.
Or at least, can't imitate it in a way that would generalize to imitating a person spending years building a completely different field of knowledge that's not in the training data.
Heading.
Some things that are off-topic for this post.
As mentioned at the top, I'm hoping that this post is a narrow pedagogical point.