Aman Sanger
👤 PersonAppearances Over Time
Podcast Appearances
Yeah, I mean, so GitHub tries to solve this, right, with code review. When you're doing code review, you're reviewing multiple diffs across multiple files. But like Arvid said earlier, I think you can do much better than code review. You know, code review kind of sucks. Like, you spend a lot of time trying to grok this code that's often quite unfamiliar to you, and...
Yeah, I mean, so GitHub tries to solve this, right, with code review. When you're doing code review, you're reviewing multiple diffs across multiple files. But like Arvid said earlier, I think you can do much better than code review. You know, code review kind of sucks. Like, you spend a lot of time trying to grok this code that's often quite unfamiliar to you, and...
Yeah, I mean, so GitHub tries to solve this, right, with code review. When you're doing code review, you're reviewing multiple diffs across multiple files. But like Arvid said earlier, I think you can do much better than code review. You know, code review kind of sucks. Like, you spend a lot of time trying to grok this code that's often quite unfamiliar to you, and...
it often doesn't even actually catch that many bugs. And I think you can significantly improve that review experience using language models, for example, using the kinds of tricks that Arvind had described of maybe pointing you towards the regions that actually matter.
it often doesn't even actually catch that many bugs. And I think you can significantly improve that review experience using language models, for example, using the kinds of tricks that Arvind had described of maybe pointing you towards the regions that actually matter.
it often doesn't even actually catch that many bugs. And I think you can significantly improve that review experience using language models, for example, using the kinds of tricks that Arvind had described of maybe pointing you towards the regions that actually matter.
I think also if the code is produced by these language models and it's not produced by someone else, like the code review experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, You don't have to care that much about their experience.
I think also if the code is produced by these language models and it's not produced by someone else, like the code review experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, You don't have to care that much about their experience.
I think also if the code is produced by these language models and it's not produced by someone else, like the code review experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, You don't have to care that much about their experience.
And you can design the entire thing around the reviewers such that the reviewer's job is as fun, as easy, as productive as possible. And I think that feels like the issue with just kind of naively trying to make these things look like code review. I think you can be a lot more creative and push the boundary on what's possible.
And you can design the entire thing around the reviewers such that the reviewer's job is as fun, as easy, as productive as possible. And I think that feels like the issue with just kind of naively trying to make these things look like code review. I think you can be a lot more creative and push the boundary on what's possible.
And you can design the entire thing around the reviewers such that the reviewer's job is as fun, as easy, as productive as possible. And I think that feels like the issue with just kind of naively trying to make these things look like code review. I think you can be a lot more creative and push the boundary on what's possible.
Well, Cursor really works via this ensemble of custom models that we've trained alongside the frontier models that are fantastic at the reasoning intense things. And so CursorTab, for example, is a great example of where you can specialize this model to be even better than even frontier models if you look at evals on the task we set it at.
Well, Cursor really works via this ensemble of custom models that we've trained alongside the frontier models that are fantastic at the reasoning intense things. And so CursorTab, for example, is a great example of where you can specialize this model to be even better than even frontier models if you look at evals on the task we set it at.
Well, Cursor really works via this ensemble of custom models that we've trained alongside the frontier models that are fantastic at the reasoning intense things. And so CursorTab, for example, is a great example of where you can specialize this model to be even better than even frontier models if you look at evals on the task we set it at.
The other domain, which it's kind of surprising that it requires custom models, but it's kind of necessary and works quite well, is in apply. So I think these models are like the frontier models are quite good at sketching out plans for code and generating like rough sketches of like the change. But actually, Creating diffs is quite hard for frontier models, for your training models.
The other domain, which it's kind of surprising that it requires custom models, but it's kind of necessary and works quite well, is in apply. So I think these models are like the frontier models are quite good at sketching out plans for code and generating like rough sketches of like the change. But actually, Creating diffs is quite hard for frontier models, for your training models.
The other domain, which it's kind of surprising that it requires custom models, but it's kind of necessary and works quite well, is in apply. So I think these models are like the frontier models are quite good at sketching out plans for code and generating like rough sketches of like the change. But actually, Creating diffs is quite hard for frontier models, for your training models.
You try to do this with Sonnet, with O1, any frontier model, and it really messes up stupid things like counting line numbers, especially in super, super large files. And so what we've done to alleviate this is we let the model kind of sketch out this rough code block that indicates what the change will be. And we train a model to then apply that change to the file.
You try to do this with Sonnet, with O1, any frontier model, and it really messes up stupid things like counting line numbers, especially in super, super large files. And so what we've done to alleviate this is we let the model kind of sketch out this rough code block that indicates what the change will be. And we train a model to then apply that change to the file.