Aman Sanger
๐ค PersonAppearances Over Time
Podcast Appearances
Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.
If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.
If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.
If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.
So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.
So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.
So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.
So what you can do is you can just feed chunks of the original code back into the model. And then the model will just pretty much agree most of the time that, okay, I'm just going to spit this code back out. And so you can process all of those lines in parallel. And you just do this with sufficiently many chunks. And then eventually you'll reach a point of disagreement.
So what you can do is you can just feed chunks of the original code back into the model. And then the model will just pretty much agree most of the time that, okay, I'm just going to spit this code back out. And so you can process all of those lines in parallel. And you just do this with sufficiently many chunks. And then eventually you'll reach a point of disagreement.
So what you can do is you can just feed chunks of the original code back into the model. And then the model will just pretty much agree most of the time that, okay, I'm just going to spit this code back out. And so you can process all of those lines in parallel. And you just do this with sufficiently many chunks. And then eventually you'll reach a point of disagreement.
where the model will now predict text that is different from the ground truth original code. It'll generate those tokens, and then we kind of will decide after enough tokens match the original code to restart speculating in chunks of code. What this actually ends up looking like is just a much faster version of normal editing code.
where the model will now predict text that is different from the ground truth original code. It'll generate those tokens, and then we kind of will decide after enough tokens match the original code to restart speculating in chunks of code. What this actually ends up looking like is just a much faster version of normal editing code.
where the model will now predict text that is different from the ground truth original code. It'll generate those tokens, and then we kind of will decide after enough tokens match the original code to restart speculating in chunks of code. What this actually ends up looking like is just a much faster version of normal editing code.
So it looks like a much faster version of the model rewriting all the code. So we can use the same exact interface, that we use for diffs, but it will just stream down a lot faster.
So it looks like a much faster version of the model rewriting all the code. So we can use the same exact interface, that we use for diffs, but it will just stream down a lot faster.
So it looks like a much faster version of the model rewriting all the code. So we can use the same exact interface, that we use for diffs, but it will just stream down a lot faster.
Yeah, I think there's no model that... Grado dominates others, meaning it is better in all categories that we think matter. The categories being speed, ability to edit code, ability to process lots of code, long context, you know, a couple of other things and kind of coding capabilities. The one that I'd say right now is just kind of net best is Sonnet. I think this is a consensus opinion.
Yeah, I think there's no model that... Grado dominates others, meaning it is better in all categories that we think matter. The categories being speed, ability to edit code, ability to process lots of code, long context, you know, a couple of other things and kind of coding capabilities. The one that I'd say right now is just kind of net best is Sonnet. I think this is a consensus opinion.
Yeah, I think there's no model that... Grado dominates others, meaning it is better in all categories that we think matter. The categories being speed, ability to edit code, ability to process lots of code, long context, you know, a couple of other things and kind of coding capabilities. The one that I'd say right now is just kind of net best is Sonnet. I think this is a consensus opinion.
Our one's really interesting and it's really good at reasoning. So if you give it really hard, uh, programming interview style problems or lead code problems. It can do quite, quite well on them. But it doesn't feel like it kind of understands your rough intent as well as Sonnet does.