Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Aman Sanger

πŸ‘€ Speaker
350 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

But it has these significant limitations. Even barring capabilities, it does not stream. And that means it's really, really painful to use for things where you want to supervise the output. And instead, you're just waiting for the wall of text to show up. Also, it does feel like the early innings of test time compute and search, where it's just very, very much a v0.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And there's so many things that... like don't feel quite right. And I suspect in parallel to people increasing the amount of pre-training data and the size of the models and pre-training and finding tricks there, you'll now have this other thread of getting search to work better and better.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah, I think most of the additional value from Cursor versus everything else out there is not just integrating the new model fast like 01. It comes from all of the kind of depth that goes into these custom models that you don't realize are working for you in kind of every facet of the product, as well as like the really thoughtful UX with every single feature.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Oh, yeah.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah, I think there are three main kinds of synthetic data. The first is, so what is synthetic data first? So there's normal data, like non-synthetic data, which is just data that's naturally created, i.e. usually it'll be from humans having done things. So from some human process, you get this data. Synthetic data, the first one would be distillation.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

So having a language model kind of output tokens or probability distributions over tokens. And then you can train some less capable model on this. This approach is not gonna get you a net, like more capable model than the original one that has produced the tokens.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

but it's really useful for if there's some capability you want to elicit from some really expensive high latency model, you can then distill that down into some smaller task specific model. The second kind is when like one direction of the problem is easier than the reverse.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so a great example of this is bug detection, like we mentioned earlier, where it's a lot easier to introduce reasonable looking bugs than it is to actually detect them. And this is probably the case for humans too. And so what you can do is you can get a model that's not trained in that much data, that's not that smart, to introduce a bunch of bugs in code.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And then you can use that to then train, use a synthetic data to train a model that can be really good at detecting bugs. The last category, I think, is, I guess, the main one that it feels like the big labs are doing for synthetic data, which is... producing text with language models that can then be verified easily.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

So like, you know, extreme example of this is if you have a verification system that can detect if language is Shakespeare level and then you have a bunch of monkeys typing in typewriters, like you can eventually get enough training data to train a Shakespeare level language model.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And I mean, this is the case, like very much the case for math where verification is, is, is actually really, really easy for formal, um, formal languages, and then what you can do is you can have an okay model, generate a ton of rollouts, and then choose the ones that you know have actually proved the ground truth theorems and then train that further.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

There's similar things you can do for code with leetcode-like problems, where if you have some set of tests that you know correspond to, if something passes these tests, it has actually solved the problem. You can do the same thing where you verify that it's passed the test and then train the model and the output set of passed the tests.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think it's gonna be a little tricky getting this to work in all domains or just in general. Like having the perfect verifier feels really, really hard to do with just like open-ended miscellaneous tasks you give the model or more like long horizon tasks, even in coding.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah. Verification, it feels like it's best when you know for a fact that it's correct. And then it wouldn't be using a language model to verify. It would be using tests or formal systems. Or running the thing, too.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah, yeah.