Dr. Terry Sejnowski
๐ค SpeakerAppearances Over Time
Podcast Appearances
But one of the things, though, is that learning is involved. And this is really a problem that we cracked. first theoretically in the 90s and then experimentally later by recording from neurons and also brain imaging in humans. So it turns out we know the algorithm that is used in the brain for how to learn sequences of actions to achieve a goal.
But one of the things, though, is that learning is involved. And this is really a problem that we cracked. first theoretically in the 90s and then experimentally later by recording from neurons and also brain imaging in humans. So it turns out we know the algorithm that is used in the brain for how to learn sequences of actions to achieve a goal.
And it's the simplest possible algorithm you can imagine. It's simply to predict... the next reward you're going to get. If I do an action, will it give me something of value? And you learn every time you try something, whether you got the amount of reward you expected or less, you use that to update the synapses.
And it's the simplest possible algorithm you can imagine. It's simply to predict... the next reward you're going to get. If I do an action, will it give me something of value? And you learn every time you try something, whether you got the amount of reward you expected or less, you use that to update the synapses.
And it's the simplest possible algorithm you can imagine. It's simply to predict... the next reward you're going to get. If I do an action, will it give me something of value? And you learn every time you try something, whether you got the amount of reward you expected or less, you use that to update the synapses.
synaptic plasticity so that the next time you'll have a better chance of getting a better reward and you build up what's called a value function. So the cortex now over your lifetime is building up a lot of knowledge about things that are good for you, things that are bad for you. Like you go to a restaurant, you order something, how do you know what's good for you, right?
synaptic plasticity so that the next time you'll have a better chance of getting a better reward and you build up what's called a value function. So the cortex now over your lifetime is building up a lot of knowledge about things that are good for you, things that are bad for you. Like you go to a restaurant, you order something, how do you know what's good for you, right?
synaptic plasticity so that the next time you'll have a better chance of getting a better reward and you build up what's called a value function. So the cortex now over your lifetime is building up a lot of knowledge about things that are good for you, things that are bad for you. Like you go to a restaurant, you order something, how do you know what's good for you, right?
You've had lots of meals in a lot of places and now that is part of your value function. This is the same algorithm that was used by AlphaGo. This is the program that DeepMind built. This is an AI program that beat the world Go champion. And Go is the most complex game that humans have ever created. played on a regular basis.
You've had lots of meals in a lot of places and now that is part of your value function. This is the same algorithm that was used by AlphaGo. This is the program that DeepMind built. This is an AI program that beat the world Go champion. And Go is the most complex game that humans have ever created. played on a regular basis.
You've had lots of meals in a lot of places and now that is part of your value function. This is the same algorithm that was used by AlphaGo. This is the program that DeepMind built. This is an AI program that beat the world Go champion. And Go is the most complex game that humans have ever created. played on a regular basis.
Yeah, that's right. So go is to chess, where chess is to something like checkers. In other words, the level of difficulty is another way off above it because you have to think in terms of battles going on all over the place at the same time. And the order in which you put the pieces down are going to affect what's going to happen in the future.
Yeah, that's right. So go is to chess, where chess is to something like checkers. In other words, the level of difficulty is another way off above it because you have to think in terms of battles going on all over the place at the same time. And the order in which you put the pieces down are going to affect what's going to happen in the future.
Yeah, that's right. So go is to chess, where chess is to something like checkers. In other words, the level of difficulty is another way off above it because you have to think in terms of battles going on all over the place at the same time. And the order in which you put the pieces down are going to affect what's going to happen in the future.
what you identified is a very important feature, which is that rewards, by the way, every time you do something, you're updating this value function, every time. And it accumulates. And the answer to your first question, the answer is that it's always going to be there. It doesn't matter. It's a very permanent part of your experience and who you are.
what you identified is a very important feature, which is that rewards, by the way, every time you do something, you're updating this value function, every time. And it accumulates. And the answer to your first question, the answer is that it's always going to be there. It doesn't matter. It's a very permanent part of your experience and who you are.
what you identified is a very important feature, which is that rewards, by the way, every time you do something, you're updating this value function, every time. And it accumulates. And the answer to your first question, the answer is that it's always going to be there. It doesn't matter. It's a very permanent part of your experience and who you are.
And interestingly, and the behaviorists knew this back in the 1950s, that you can get there two ways of trial and error. Small rewards are good because you're constantly coming closer and closer to getting what you're seeking, better tennis player or being able to make a friend, but the negative Punishment is much more effective, one trial learning.
And interestingly, and the behaviorists knew this back in the 1950s, that you can get there two ways of trial and error. Small rewards are good because you're constantly coming closer and closer to getting what you're seeking, better tennis player or being able to make a friend, but the negative Punishment is much more effective, one trial learning.
And interestingly, and the behaviorists knew this back in the 1950s, that you can get there two ways of trial and error. Small rewards are good because you're constantly coming closer and closer to getting what you're seeking, better tennis player or being able to make a friend, but the negative Punishment is much more effective, one trial learning.