Grant Harvey
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Well, everyone, thank you so much for joining us on the Neuron podcast.
If you enjoyed this conversation, please subscribe wherever you listen to your podcasts and check out the Neuron newsletter at theneuron.ai for daily AI insights.
Until next time, though, farewell, humans.
The way I think of it is, so large language models, they're not like traditional machine learning, right?
They're non-deterministic.
They're based on neural nets.
They do some funky things.
I think of a large language model like an enthusiastic teenager.
Yeah, he really wants to answer questions, and he wants to get smart.
But if you want a teenager to learn something, like there are some ways you can teach them, right?
You could, for example, you could take them to a library, or you could give them a load of homework, or you could set them a test.
We kind of do all those three things.
When you hear supervised fine-tuning or reinforcement learning about human feedback or evaluations, that's actually one of those three things.
So supervised fine-tuning is giving a model loads of real high-quality examples of data sets to look like.
That's taking your model to the library and saying, here's some textbooks to read.
It's going to read the textbooks.
They'll tell you what's true.
Reinforcement learning is, okay, you're going to give the model some questions.
It'll give some answers, and you're going to say if those answers are good or not.
Like you might ask the model to write me a poem about...