Emad Mostaque
๐ค SpeakerAppearances Over Time
Podcast Appearances
And it's more likely to be aligned as a result of that than training on the whole internet and crap.
Is it a panacea?
Is it perfect?
No.
Do you know the story of Buddha?
But then the reward function of the internet is not necessarily the reward function that we would like to teach our kids or try to teach a general purpose AI.
They can interact with that, but they can learn how to adapt to it.
Just like if you raise your kids well and you show them the internet, they should be able to deal with it.
You can give AI values.
This is the reinforcement learning function.
I'd say there's not much of a difference there.
I said you can embed things in the AI so it acts in certain ways.
You can expose it to the internet, but again, we have something called curriculum learning in AI, whereby literally we teach it one thing and then we increment it with something else and something else and something else and something else.
how are we teaching these things what are we teaching in what order do we start with all of the internet and then distill it down that's how we're doing right now or do we teach it a whole bunch of high quality stuff and then augment it from there we already have evidence that's the tiny stories paper and the five paper from microsoft that you can have a far more efficient ai if you only teach it high quality things so you don't have to tell it ignore that ignore that don't answer like that don't say that
Yeah, exactly.
You can just teach it a good base, and then it goes from there.
And it scores higher on a human evaluation and other metrics, but we don't know what the right data set is.
It's just right now, we said, let's scale.
More data, more compute.
Now we're like, what's the right data?