Ilya Sutskever
👤 PersonAppearances Over Time
Podcast Appearances
And so now you've got this great competitive programmer.
And with this analogy, I think it's more intuitive.
I think it's more intuitive with this analogy that, yeah, okay, so if it's so well-trained, okay, it's like all the different algorithms and all the different proof techniques are like right at its fingertips.
And it's more intuitive that with this level of preparation, it will not necessarily generalize to other things.
I think it's the it factor.
Yeah.
Right?
And I know, when I was an undergrad, I remember there was a student like this that studied with me.
So I know it exists.
Like the main strength of pre-training is that there is A, so much of it.
Yeah.
And B, you don't have to think hard about what data to put into pre-training.
And it's a very kind of natural data and it does include in it a lot of what people do.
people's thoughts and a lot of the features of, you know, it's like the whole world as projected by people onto text.
And pre-training tries to capture that using a huge amount of data.
It's very, pre-training is very difficult to reason about because it's so hard to understand the manner in which the model relies on pre-training data.
And whenever the model makes a mistake, could it be because something by chance is not as supported by the pre-training data?
You know, and support by pre-training is maybe a loose term.
I don't know if I can add anything more useful on this, but I don't think there is a human analog to pre-training.
I think there are some similarities between both of these two pre-training and pre-training tries to play the role of both of these.