Yoshua Bengio
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And now that's credit assignment used for learning.
So current nets are doing a fairly good jobs for sequences with dozens or say hundreds of time steps.
And then it gets sort of harder and harder.
And depending on what you have to remember, and so on, as you consider longer durations, whereas humans seem to be able to do credit assignment through essentially arbitrary times, like I could remember something I did last year.
And now because I see some new evidence, I'm going to change my mind about the way I was thinking last year, and hopefully not do the same mistake again.
Yes, so there's a selection of what we remember.
And I think there are really cool connection to higher level cognition here regarding consciousness, deciding and emotions like so those deciding what comes to consciousness and what gets stored in memory, which which are not trivial either.
So current state-of-the-art neural nets trained on large quantities of images or texts have some level of understanding of what explains those data sets, but it's very basic.
It's very low level, and it's not nearly as robust and abstract and general as our understandings.
Okay, so that doesn't tell us how to fix things, but I think it encourages us to think about how we can maybe train our neural nets differently so that they would focus, for example, on causal explanation, something that we don't do currently with neural net training.
Also, one thing I'll talk about in my talk this afternoon is instead of learning separately from images and videos on one hand and from text on the other hand, we need to do a better job of jointly learning about language and about the world to which it refers so that both sides can help each other.
We need to have good world models in our neural nets for them to really understand sentences which talk about what's going on in the world.
And I think we need language input to help provide clues about what high level concepts like semantic concepts should be represented at the top levels of these neural nets.
In fact, there is evidence that the purely unsupervised learning of representations doesn't give rise to high-level representations that are as powerful as the ones we're getting from supervised learning.
And so the clues we're getting just with the labels, not even sentences, is already very powerful.
You elaborated slightly.
Of course, data sets and architectures are something you want to always play with, but I think the crucial thing is more the training objectives, the training frameworks.