Dwarkesh Patel
👤 PersonAppearances Over Time
Podcast Appearances
which have increased programmer productivity, but have not led to an explosion.
So that sounds very much like autocomplete tab.
And this other category is just like automation of the programmer.
And so it's interesting you're seeing more in the category of the historical analogies of better compilers or something.
One of the big problems with RL is that it's incredibly information sparse.
LabelBox can help you with this by increasing the amount of information that your agent gets to learn from with every single episode.
For example, one of their customers wanted to train a coding agent.
So LabelBox augmented an IDE with a bunch of extra data collection tools and staffed a team of expert software engineers from their aligner network to generate trajectories that were optimized for training.
Now, obviously these engineers evaluated these interactions on a pass-fail basis, but they also rated every single response on a bunch of different dimensions like readability and performance.
And they wrote down their thought processes for every single rating that they gave.
So you're basically showing every single step an engineer takes and every single thought that they have while they're doing their job.
And this is just something you could never get from usage data alone.
And so LabelBox packaged up all these evaluations and included all the agent trajectories and the corrective human edits for the customer to train on.
This is just one example, so go check out how Labelbox can get you high-quality frontier data across domains, modalities, and training paradigms.
Reach out at labelbox.com slash thwarkesh.
Let's talk about RL a bit.
You two did some very interesting things about this.
Conceptually, how should we think about the way that humans are able to build a rich world model just from interacting with our environment and in ways that seems almost irrespective of the final reward at the end of the episode?
If somebody's starting to start a business and at the end of 10 years she finds out whether the business succeeded or failed,
We say that she's earned a bunch of wisdom and experience, but it's not because like the log probs of every single thing that happened over the last 10 years are up-weighted or down-weighted.