Michael Truell
๐ค SpeakerAppearances Over Time
Podcast Appearances
And that was the original idea behind, that was kind of the kernel of the idea behind a good autocomplete was predicting what you're gonna do next. But you can make that concept even more ambitious by not just predicting the characters after your cursor, but actually predicting the next entire change you're gonna make, the next diff, next place you're gonna jump to.
And the second thing Kirscher is pretty good at right now, too, is helping you sometimes jump ahead of the AI and tell it what to do and go from instructions to code. And on both of those, we've done a lot of work on making the editing experience for those things ergonomic and also making those things smart and fast.
And the second thing Kirscher is pretty good at right now, too, is helping you sometimes jump ahead of the AI and tell it what to do and go from instructions to code. And on both of those, we've done a lot of work on making the editing experience for those things ergonomic and also making those things smart and fast.
And the second thing Kirscher is pretty good at right now, too, is helping you sometimes jump ahead of the AI and tell it what to do and go from instructions to code. And on both of those, we've done a lot of work on making the editing experience for those things ergonomic and also making those things smart and fast.
Yeah, and the magic moment would be if... it is programming is this weird discipline where sometimes the next five minutes, not always, but sometimes the next five minutes, what you're going to do is actually predictable from the stuff you've done recently.
Yeah, and the magic moment would be if... it is programming is this weird discipline where sometimes the next five minutes, not always, but sometimes the next five minutes, what you're going to do is actually predictable from the stuff you've done recently.
Yeah, and the magic moment would be if... it is programming is this weird discipline where sometimes the next five minutes, not always, but sometimes the next five minutes, what you're going to do is actually predictable from the stuff you've done recently.
And so can you get to a world where that next five minutes either happens by you disengaging and it taking you through, or maybe a little bit more of just you seeing next step, what it's going to do. And you're like, okay, that's good. That's good. That's good. That's good. And you can just sort of tap, tap, tap through these big changes.
And so can you get to a world where that next five minutes either happens by you disengaging and it taking you through, or maybe a little bit more of just you seeing next step, what it's going to do. And you're like, okay, that's good. That's good. That's good. That's good. And you can just sort of tap, tap, tap through these big changes.
And so can you get to a world where that next five minutes either happens by you disengaging and it taking you through, or maybe a little bit more of just you seeing next step, what it's going to do. And you're like, okay, that's good. That's good. That's good. That's good. And you can just sort of tap, tap, tap through these big changes.
I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.
I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.
I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.
And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.
And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.
And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.
And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.
And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.
And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.
And so it's also just the public aspect of these things is tricky.