Aman Sanger
👤 PersonAppearances Over Time
Podcast Appearances
I think, like you mentioned, in the future, I think this is only going to get more and more powerful where we're working a lot on improving the quality of our retrieval. And I think the ceiling for that is really, really much higher than people give it credit for.
I think, like you mentioned, in the future, I think this is only going to get more and more powerful where we're working a lot on improving the quality of our retrieval. And I think the ceiling for that is really, really much higher than people give it credit for.
Yeah, like an approximate nearest neighbors and this massive code base is going to just eat up your memory and your CPU. And that's just that. Let's talk about also the modeling side where, as Arvid said, there are these massive headwinds against local models where, one, things seem to move towards MOEs.
Yeah, like an approximate nearest neighbors and this massive code base is going to just eat up your memory and your CPU. And that's just that. Let's talk about also the modeling side where, as Arvid said, there are these massive headwinds against local models where, one, things seem to move towards MOEs.
Yeah, like an approximate nearest neighbors and this massive code base is going to just eat up your memory and your CPU. And that's just that. Let's talk about also the modeling side where, as Arvid said, there are these massive headwinds against local models where, one, things seem to move towards MOEs.
One benefit is maybe they're more memory bandwidth bound, which plays in favor of local versus using GPUs or using NVIDIA GPUs. But the downside is these models are just bigger in total. And they're going to need to fit often not even on a single node, but multiple nodes. There's no way that's going to fit inside of even really good MacBooks. And I think especially for coding,
One benefit is maybe they're more memory bandwidth bound, which plays in favor of local versus using GPUs or using NVIDIA GPUs. But the downside is these models are just bigger in total. And they're going to need to fit often not even on a single node, but multiple nodes. There's no way that's going to fit inside of even really good MacBooks. And I think especially for coding,
One benefit is maybe they're more memory bandwidth bound, which plays in favor of local versus using GPUs or using NVIDIA GPUs. But the downside is these models are just bigger in total. And they're going to need to fit often not even on a single node, but multiple nodes. There's no way that's going to fit inside of even really good MacBooks. And I think especially for coding,
It's not a question as much of like, does it clear some bar of like the models good enough to do these things? And then like we're satisfied, which may be the case for other problems and maybe where local models shine. But people are always going to want the best, the most intelligent, the most capable things. And that's going to be really, really hard to run for almost all people locally.
It's not a question as much of like, does it clear some bar of like the models good enough to do these things? And then like we're satisfied, which may be the case for other problems and maybe where local models shine. But people are always going to want the best, the most intelligent, the most capable things. And that's going to be really, really hard to run for almost all people locally.
It's not a question as much of like, does it clear some bar of like the models good enough to do these things? And then like we're satisfied, which may be the case for other problems and maybe where local models shine. But people are always going to want the best, the most intelligent, the most capable things. And that's going to be really, really hard to run for almost all people locally.
Why do you think it's different than cloud providers?
Why do you think it's different than cloud providers?
Why do you think it's different than cloud providers?
Like one interesting proof of concept for the learning this knowledge directly in the weights is with VS Code. So we're in a VS Code fork and VS Code, the code is all public. So these models in pre-training have seen all the code. They've probably also seen questions and answers about it, and then they've been fine-tuned and RLHFed to be able to answer questions about code in general.
Like one interesting proof of concept for the learning this knowledge directly in the weights is with VS Code. So we're in a VS Code fork and VS Code, the code is all public. So these models in pre-training have seen all the code. They've probably also seen questions and answers about it, and then they've been fine-tuned and RLHFed to be able to answer questions about code in general.
Like one interesting proof of concept for the learning this knowledge directly in the weights is with VS Code. So we're in a VS Code fork and VS Code, the code is all public. So these models in pre-training have seen all the code. They've probably also seen questions and answers about it, and then they've been fine-tuned and RLHFed to be able to answer questions about code in general.
So when you ask it a question about VS Code, sometimes it'll hallucinate, but sometimes it actually does a pretty good job at answering the question. And I think like this is just by, it happens to be okay at it. But what if you could actually like specifically train or post train a model such that it really was built to understand this code base?
So when you ask it a question about VS Code, sometimes it'll hallucinate, but sometimes it actually does a pretty good job at answering the question. And I think like this is just by, it happens to be okay at it. But what if you could actually like specifically train or post train a model such that it really was built to understand this code base?
So when you ask it a question about VS Code, sometimes it'll hallucinate, but sometimes it actually does a pretty good job at answering the question. And I think like this is just by, it happens to be okay at it. But what if you could actually like specifically train or post train a model such that it really was built to understand this code base?