Arvid Lundmark
👤 PersonAppearances Over Time
Podcast Appearances
But then the idea is we show you, do you just want to add this file, this file, this file also to tell the model to edit those files for you? Because if maybe you're making the API, you should also edit the client and the server that is using the API and the other one resolving the API.
But then the idea is we show you, do you just want to add this file, this file, this file also to tell the model to edit those files for you? Because if maybe you're making the API, you should also edit the client and the server that is using the API and the other one resolving the API.
But then the idea is we show you, do you just want to add this file, this file, this file also to tell the model to edit those files for you? Because if maybe you're making the API, you should also edit the client and the server that is using the API and the other one resolving the API.
So that'll be kind of cool as both there's the phase where you're writing the prompt and there's before you even click enter, maybe we can help resolve some of the uncertainty.
So that'll be kind of cool as both there's the phase where you're writing the prompt and there's before you even click enter, maybe we can help resolve some of the uncertainty.
So that'll be kind of cool as both there's the phase where you're writing the prompt and there's before you even click enter, maybe we can help resolve some of the uncertainty.
And if you can make the KV cache smaller, one of the advantages you get is like, maybe you can speculate even more. Maybe you can guess, here's the 10 things that... could be useful. Like, predict the next 10, and it's possible the user hits the one of the 10. It's a much higher chance than the user hits the exact one that you show them.
And if you can make the KV cache smaller, one of the advantages you get is like, maybe you can speculate even more. Maybe you can guess, here's the 10 things that... could be useful. Like, predict the next 10, and it's possible the user hits the one of the 10. It's a much higher chance than the user hits the exact one that you show them.
And if you can make the KV cache smaller, one of the advantages you get is like, maybe you can speculate even more. Maybe you can guess, here's the 10 things that... could be useful. Like, predict the next 10, and it's possible the user hits the one of the 10. It's a much higher chance than the user hits the exact one that you show them.
Maybe they type another character, and we sort of hit something else in the cache. So there's all these tricks where... The general phenomena here is... I think it's also super useful for RL is... maybe a single sample from the model isn't very good. But if you predict like 10 different things, it turns out that one of the 10, that's right, is the probability is much higher.
Maybe they type another character, and we sort of hit something else in the cache. So there's all these tricks where... The general phenomena here is... I think it's also super useful for RL is... maybe a single sample from the model isn't very good. But if you predict like 10 different things, it turns out that one of the 10, that's right, is the probability is much higher.
Maybe they type another character, and we sort of hit something else in the cache. So there's all these tricks where... The general phenomena here is... I think it's also super useful for RL is... maybe a single sample from the model isn't very good. But if you predict like 10 different things, it turns out that one of the 10, that's right, is the probability is much higher.
There's these passive key curves. And, you know, part of RL, like what RL does is you can exploit this pass at k phenomena to make many different predictions. And one way to think about this, the model sort of knows internally, has some uncertainty over which of the k things is correct, or which of the k things does the human want.
There's these passive key curves. And, you know, part of RL, like what RL does is you can exploit this pass at k phenomena to make many different predictions. And one way to think about this, the model sort of knows internally, has some uncertainty over which of the k things is correct, or which of the k things does the human want.
There's these passive key curves. And, you know, part of RL, like what RL does is you can exploit this pass at k phenomena to make many different predictions. And one way to think about this, the model sort of knows internally, has some uncertainty over which of the k things is correct, or which of the k things does the human want.
So when we RL our cursor tab model, one of the things we're doing is we're predicting which of the hundred different suggestions the model produces is more amenable for humans? Like, which of them do humans more like than other things? Maybe, like, there's something where the model can predict very far ahead versus, like, a little bit and maybe somewhere in the middle and...
So when we RL our cursor tab model, one of the things we're doing is we're predicting which of the hundred different suggestions the model produces is more amenable for humans? Like, which of them do humans more like than other things? Maybe, like, there's something where the model can predict very far ahead versus, like, a little bit and maybe somewhere in the middle and...
So when we RL our cursor tab model, one of the things we're doing is we're predicting which of the hundred different suggestions the model produces is more amenable for humans? Like, which of them do humans more like than other things? Maybe, like, there's something where the model can predict very far ahead versus, like, a little bit and maybe somewhere in the middle and...
And then you can give a reward to the things that humans would like more and sort of punish the things that it won't like and sort of then train the model to output the suggestions that humans would like more. You have these like RL loops that are very useful that exploit these passive K-curves. Oman maybe can go into even more detail.
And then you can give a reward to the things that humans would like more and sort of punish the things that it won't like and sort of then train the model to output the suggestions that humans would like more. You have these like RL loops that are very useful that exploit these passive K-curves. Oman maybe can go into even more detail.