Yann LeCun
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
It's the first system that we have that learns good representations of video so that when you feed those representations to a supervised classifier head, it can tell you what action is taking place in the video with pretty good accuracy. So that's the first time we get something of that quality.
It's the first system that we have that learns good representations of video so that when you feed those representations to a supervised classifier head, it can tell you what action is taking place in the video with pretty good accuracy. So that's the first time we get something of that quality.
It's the first system that we have that learns good representations of video so that when you feed those representations to a supervised classifier head, it can tell you what action is taking place in the video with pretty good accuracy. So that's the first time we get something of that quality.
Yeah. We have also preliminary results that seem to indicate that the representation allows our system to tell whether the video is physically possible or completely impossible because some object disappeared or an object suddenly jumped from one location to another or changed shape or something.
Yeah. We have also preliminary results that seem to indicate that the representation allows our system to tell whether the video is physically possible or completely impossible because some object disappeared or an object suddenly jumped from one location to another or changed shape or something.
Yeah. We have also preliminary results that seem to indicate that the representation allows our system to tell whether the video is physically possible or completely impossible because some object disappeared or an object suddenly jumped from one location to another or changed shape or something.
Possibly. This is going to take a while before we get to that point, but there are robotic systems that are based on this idea. And what you need for this is a slightly modified version of this, where imagine that you have a complete video.
Possibly. This is going to take a while before we get to that point, but there are robotic systems that are based on this idea. And what you need for this is a slightly modified version of this, where imagine that you have a complete video.
Possibly. This is going to take a while before we get to that point, but there are robotic systems that are based on this idea. And what you need for this is a slightly modified version of this, where imagine that you have a complete video.
And what you're doing to this video is that you're either translating it in time towards the future, so you only see the beginning of the video, but you don't see the latter part of it that is in the original one. Or you just mask the second half of the video, for example. And then you train a JEPA system of the type I described to predict the representation of the full video from the shifted one.
And what you're doing to this video is that you're either translating it in time towards the future, so you only see the beginning of the video, but you don't see the latter part of it that is in the original one. Or you just mask the second half of the video, for example. And then you train a JEPA system of the type I described to predict the representation of the full video from the shifted one.
And what you're doing to this video is that you're either translating it in time towards the future, so you only see the beginning of the video, but you don't see the latter part of it that is in the original one. Or you just mask the second half of the video, for example. And then you train a JEPA system of the type I described to predict the representation of the full video from the shifted one.
But you also feed the predictor with an action. For example, the wheel is turned 10 degrees to the right or something. So if it's a dash cam in a car and you know the angle of the wheel, you should be able to predict to some extent what's going to happen to what you see.
But you also feed the predictor with an action. For example, the wheel is turned 10 degrees to the right or something. So if it's a dash cam in a car and you know the angle of the wheel, you should be able to predict to some extent what's going to happen to what you see.
But you also feed the predictor with an action. For example, the wheel is turned 10 degrees to the right or something. So if it's a dash cam in a car and you know the angle of the wheel, you should be able to predict to some extent what's going to happen to what you see.
You're not going to be able to predict all the details of objects that appear in the view, obviously, but at an abstract representation level, you can probably predict what's going to happen. So now what you have is...
You're not going to be able to predict all the details of objects that appear in the view, obviously, but at an abstract representation level, you can probably predict what's going to happen. So now what you have is...
You're not going to be able to predict all the details of objects that appear in the view, obviously, but at an abstract representation level, you can probably predict what's going to happen. So now what you have is...
an internal model that says, here is my idea of state of the world at time t, here is an action I'm taking, here is a prediction of the state of the world at time t plus one, t plus delta t, t plus two seconds, whatever it is. If you have a model of this type, you can use it for planning.
an internal model that says, here is my idea of state of the world at time t, here is an action I'm taking, here is a prediction of the state of the world at time t plus one, t plus delta t, t plus two seconds, whatever it is. If you have a model of this type, you can use it for planning.