Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

πŸ‘€ Speaker
15444 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Now, obviously these engineers evaluated these interactions on a pass-fail basis, but they also rated every single response on a bunch of different dimensions like readability and performance.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

And they wrote down their thought processes for every single rating that they gave.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

So you're basically showing every single step an engineer takes and every single thought that they have while they're doing their job.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

And this is just something you could never get from usage data alone.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

And so LabelBox packaged up all these evaluations and included all the agent trajectories and the corrective human edits for the customer to train on.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

This is just one example, so go check out how Labelbox can get you high-quality frontier data across domains, modalities, and training paradigms.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Reach out at labelbox.com slash thwarkesh.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Let's talk about RL a bit.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

You two did some very interesting things about this.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Conceptually, how should we think about the way that humans are able to build a rich world model just from interacting with our environment and in ways that seems almost irrespective of the final reward at the end of the episode?

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

If somebody's starting to start a business and at the end of 10 years she finds out whether the business succeeded or failed,

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

We say that she's earned a bunch of wisdom and experience, but it's not because like the log probs of every single thing that happened over the last 10 years are up-weighted or down-weighted.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

It's something much more deliberate and rich is happening.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

What is the ML analogy and how does that compare to what we're doing with other ones right now?

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

But you're so good at coming up with evocative phrases.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Sucking supervision through a straw is, like, so good.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Why hasn'tβ€”so you're saying, like, your problem with outcome-based reward is that you have this huge trajectory, and then at the end, you're trying to learn every single possible thing about what you should do and what you should learn about the world from that one final bit.

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

Why hasn'tβ€”given the fact that this is obviousβ€”why hasn't process-based supervisionβ€”

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

as an alternative been a successful way to make models more capable?

Dwarkesh Podcast
Andrej Karpathy β€” AGI is still a decade away

What has been preventing us from using this alternative paradigm?