Aman Sanger
👤 PersonAppearances Over Time
Podcast Appearances
You try to do this with Sonnet, with O1, any frontier model, and it really messes up stupid things like counting line numbers, especially in super, super large files. And so what we've done to alleviate this is we let the model kind of sketch out this rough code block that indicates what the change will be. And we train a model to then apply that change to the file.
Yeah. I think like you see shallow copies of apply, um, elsewhere and it just breaks like most of the time, because you think you can kind of try to do some deterministic matching and then it fails, you know, at least 40% of the time. And that just results in a terrible product experience. Um, I think in general, this regime of you are going to get smarter and smarter models.
Yeah. I think like you see shallow copies of apply, um, elsewhere and it just breaks like most of the time, because you think you can kind of try to do some deterministic matching and then it fails, you know, at least 40% of the time. And that just results in a terrible product experience. Um, I think in general, this regime of you are going to get smarter and smarter models.
Yeah. I think like you see shallow copies of apply, um, elsewhere and it just breaks like most of the time, because you think you can kind of try to do some deterministic matching and then it fails, you know, at least 40% of the time. And that just results in a terrible product experience. Um, I think in general, this regime of you are going to get smarter and smarter models.
So one other thing that Apply lets you do is it lets you use fewer tokens with the most intelligent models. This is both expensive in terms of latency for generating all these tokens and cost. So you can give this very, very rough sketch and then have your small models go and implement it because it's a much easier task to implement this very, very sketched out code.
So one other thing that Apply lets you do is it lets you use fewer tokens with the most intelligent models. This is both expensive in terms of latency for generating all these tokens and cost. So you can give this very, very rough sketch and then have your small models go and implement it because it's a much easier task to implement this very, very sketched out code.
So one other thing that Apply lets you do is it lets you use fewer tokens with the most intelligent models. This is both expensive in terms of latency for generating all these tokens and cost. So you can give this very, very rough sketch and then have your small models go and implement it because it's a much easier task to implement this very, very sketched out code.
And I think that this regime will continue where you can use smarter and smarter models to do the planning and then maybe the implementation details can be handled by the less intelligent ones. Perhaps you'll have, you know, maybe O1, maybe it'll be even more capable models given an even higher level plan that is kind of recursively implemented applied by Sonnet and an Eply model.
And I think that this regime will continue where you can use smarter and smarter models to do the planning and then maybe the implementation details can be handled by the less intelligent ones. Perhaps you'll have, you know, maybe O1, maybe it'll be even more capable models given an even higher level plan that is kind of recursively implemented applied by Sonnet and an Eply model.
And I think that this regime will continue where you can use smarter and smarter models to do the planning and then maybe the implementation details can be handled by the less intelligent ones. Perhaps you'll have, you know, maybe O1, maybe it'll be even more capable models given an even higher level plan that is kind of recursively implemented applied by Sonnet and an Eply model.
Fast is always an interesting detail. Fast is good.
Fast is always an interesting detail. Fast is good.
Fast is always an interesting detail. Fast is good.
Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.
Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.
Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.
If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.
If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.
If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.
So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.