Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Andre, that was great.
Yeah, thank you.
Thanks.
Hey, everybody.
I hope you enjoyed that episode.
If you did, the most helpful thing you can do is just share it with other people who you think might enjoy it.
It's also helpful if you leave a rating or a comment on whatever platform you're listening on.
If you're interested in sponsoring the podcast, you can reach out at dwarkesh.com slash advertise.
Otherwise, I'll see you in the next one.
Well, it's not saying that you just want to throw away as much compute as you possibly can.
The Bitter Lesson says that you want to come up with techniques which most effectively and scalably leverage compute.
Most of the compute that's spent on an LLM is used in running it during deployment.
And yet it's not learning anything during this entire period.
It's only learning during this special phase that we call training.
And so this is obviously not an effective use of compute.
And what's even worse is that this training period by itself is highly inefficient because these models are usually trained on the equivalent of tens of thousands of years of human experience.
And what's more, during this training phase,
all of their learning is coming straight from human data.
Now, this is an obvious point in the case of pre-training data, but it's even kind of true for the RLVR that we do with these LLMs.
These RL environments are human-furnished playgrounds to teach LLMs the specific skills that we have prescribed for them.