Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
robust out-of-distribution kind of capabilities.
And so I wonder if the Trek over the next 5-10 years will just be like
Each subtask, you have to give it thousands of episodes, and then it's very hard to actually automate much work just by doing subtasks.
So if you think about what a barista does, what a waiter does, what a chef does, very little bit involves just sitting at one station and doing stuff, right?
You've got to move around, you've got to restock, you've got to fix the machine, et cetera, go between the counter and the cashier and the machine, et cetera.
So...
Will there just be this long tail of things that you had to keep, skills you had to keep, like adding episodes for manually and labeling and seeing how well they did, etc.?
Or is there some reason to think that it will progress more generally than that?
Right.
I had an example like this when I got a tour of the robots, by the way, at your office.
So it was folding shorts.
And I don't know if there was an episode like this in the β
in the training set, but just for fun, I took one of the shorts and turned it inside out.
And then it was able to understand that it first needed to get... So first of all, the grippers are just like this, like two limbs, or just a poseable finger and thumb-like thing.
And it's actually shocking how much you can do with just that.
Yeah, it understood that it first needed to fold it inside out before folding it correctly.
I mean, what's especially surprising about that is...
It seems like this model only has one second of context.
So as compared to these language models, which can often see the entire code base, and they're observing hundreds of thousands of tokens and thinking about them before outputting, and they're observing their own chain of thought for thousands of tokens before making a plan about how to code something up, your model is seeing one image of what happened in the last second.
And it vaguely knows it's supposed to fold this short.