Aman Sanger

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I think that that's the category that is, um, most likely to result in like massive gains.

7928.26 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I think that that's the category that is, um, most likely to result in like massive gains.

7928.26 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I think that that's the category that is, um, most likely to result in like massive gains.

7928.26 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

7946.505 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

7946.505 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

7946.505 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

RLAIF is interesting because you're kind of depending on, like this is actually kind of going to, it's depending on the constraint that verification is actually a decent bit easier than generation. Because it feels like, okay, what are you doing? Are you using this language model to look at the language model outputs and then prove the language model?

7967.843 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

RLAIF is interesting because you're kind of depending on, like this is actually kind of going to, it's depending on the constraint that verification is actually a decent bit easier than generation. Because it feels like, okay, what are you doing? Are you using this language model to look at the language model outputs and then prove the language model?

7967.843 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

RLAIF is interesting because you're kind of depending on, like this is actually kind of going to, it's depending on the constraint that verification is actually a decent bit easier than generation. Because it feels like, okay, what are you doing? Are you using this language model to look at the language model outputs and then prove the language model?

7967.843 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

But no, it actually may work if the language model has a much easier time verifying some solution than it does generating it. Then you actually could perhaps get this kind of recursive loop. I don't think it's going to look exactly like that. The other thing you could do is... we kind of do is a little bit of a mix of RLA-IF and RLA-HF, where usually the model is actually quite correct.

7988.989 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

But no, it actually may work if the language model has a much easier time verifying some solution than it does generating it. Then you actually could perhaps get this kind of recursive loop. I don't think it's going to look exactly like that. The other thing you could do is... we kind of do is a little bit of a mix of RLA-IF and RLA-HF, where usually the model is actually quite correct.

7988.989 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

But no, it actually may work if the language model has a much easier time verifying some solution than it does generating it. Then you actually could perhaps get this kind of recursive loop. I don't think it's going to look exactly like that. The other thing you could do is... we kind of do is a little bit of a mix of RLA-IF and RLA-HF, where usually the model is actually quite correct.

7988.989 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

8013.983 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

8013.983 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

8013.983 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

My intuition would just say, yeah, it should be. This is kind of going back to like if you if you believe p does not equal np then there's this massive class of problems that are much much easier to verify given a proof than actually proving it i wonder if the same thing will prove p not equal to np or p equal to np that would be that would be really cool

8052.228 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

My intuition would just say, yeah, it should be. This is kind of going back to like if you if you believe p does not equal np then there's this massive class of problems that are much much easier to verify given a proof than actually proving it i wonder if the same thing will prove p not equal to np or p equal to np that would be that would be really cool

8052.228 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

My intuition would just say, yeah, it should be. This is kind of going back to like if you if you believe p does not equal np then there's this massive class of problems that are much much easier to verify given a proof than actually proving it i wonder if the same thing will prove p not equal to np or p equal to np that would be that would be really cool

8052.228 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I feel like I have much more to do there. It felt like the path to get to IMO was a little bit more clear because it already could get a few IMO problems. And there are a bunch of like there's a bunch of low hanging fruit given the literature at the time of like what what tactics people could take. I think I'm one much less versed in the space of theorem proving now.

8116.717 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I feel like I have much more to do there. It felt like the path to get to IMO was a little bit more clear because it already could get a few IMO problems. And there are a bunch of like there's a bunch of low hanging fruit given the literature at the time of like what what tactics people could take. I think I'm one much less versed in the space of theorem proving now.

8116.717 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment