Sualeh Asif
👤 PersonAppearances Over Time
Podcast Appearances
I have a few friends who are super senior engineers, and one of their lines is like, it's very hard to predict where systems will break when you scale them. You can sort of try to predict in advance, but there's always something weird that's going to happen when you add this extra zero. You thought you thought through everything, but you didn't actually think through everything.
But I think for that particular system, we've... So for concrete details, the thing we do is obviously we upload, we chunk up all of your code and then we send up sort of the code for embedding and we embed the code. And then we store the embeddings in a database, but we don't actually store any of the code. And then there's reasons around making sure that
But I think for that particular system, we've... So for concrete details, the thing we do is obviously we upload, we chunk up all of your code and then we send up sort of the code for embedding and we embed the code. And then we store the embeddings in a database, but we don't actually store any of the code. And then there's reasons around making sure that
But I think for that particular system, we've... So for concrete details, the thing we do is obviously we upload, we chunk up all of your code and then we send up sort of the code for embedding and we embed the code. And then we store the embeddings in a database, but we don't actually store any of the code. And then there's reasons around making sure that
We don't introduce client bugs because we're very, very paranoid about client bugs. We store much of the details on the server, like everything is sort of encrypted. So one of the technical challenges is always making sure that the local index, the local code base state is the same as the state that is on the server.
We don't introduce client bugs because we're very, very paranoid about client bugs. We store much of the details on the server, like everything is sort of encrypted. So one of the technical challenges is always making sure that the local index, the local code base state is the same as the state that is on the server.
We don't introduce client bugs because we're very, very paranoid about client bugs. We store much of the details on the server, like everything is sort of encrypted. So one of the technical challenges is always making sure that the local index, the local code base state is the same as the state that is on the server.
And the way sort of technically we ended up doing that is, so for every single file, you can sort of keep this hash. And then for every folder, you can sort of keep a hash, which is the hash of all of its children. And you can sort of recursively do that until the top. And why do something complicated? One thing you could do is you could keep a hash for every file.
And the way sort of technically we ended up doing that is, so for every single file, you can sort of keep this hash. And then for every folder, you can sort of keep a hash, which is the hash of all of its children. And you can sort of recursively do that until the top. And why do something complicated? One thing you could do is you could keep a hash for every file.
And the way sort of technically we ended up doing that is, so for every single file, you can sort of keep this hash. And then for every folder, you can sort of keep a hash, which is the hash of all of its children. And you can sort of recursively do that until the top. And why do something complicated? One thing you could do is you could keep a hash for every file.
Then every minute you could try to download the hashes that are on the server, figure out what are the files that don't exist on the server. Maybe you just created a new file. Maybe you just deleted a file. Maybe you checked out a new branch and try to reconcile the state between the client and the server. But that introduces, like, absolutely ginormous network overhead.
Then every minute you could try to download the hashes that are on the server, figure out what are the files that don't exist on the server. Maybe you just created a new file. Maybe you just deleted a file. Maybe you checked out a new branch and try to reconcile the state between the client and the server. But that introduces, like, absolutely ginormous network overhead.
Then every minute you could try to download the hashes that are on the server, figure out what are the files that don't exist on the server. Maybe you just created a new file. Maybe you just deleted a file. Maybe you checked out a new branch and try to reconcile the state between the client and the server. But that introduces, like, absolutely ginormous network overhead.
Both on the client side, I mean, nobody really wants us to hammer their Wi-Fi all the time if you're using Cursor. But also, like, I mean, it would introduce, like, ginormous overhead in the database. I mean, it would sort of be reading this... Tens of terabytes database sort of approaching like 20 terabytes or something database like every second. That's just kind of crazy.
Both on the client side, I mean, nobody really wants us to hammer their Wi-Fi all the time if you're using Cursor. But also, like, I mean, it would introduce, like, ginormous overhead in the database. I mean, it would sort of be reading this... Tens of terabytes database sort of approaching like 20 terabytes or something database like every second. That's just kind of crazy.
Both on the client side, I mean, nobody really wants us to hammer their Wi-Fi all the time if you're using Cursor. But also, like, I mean, it would introduce, like, ginormous overhead in the database. I mean, it would sort of be reading this... Tens of terabytes database sort of approaching like 20 terabytes or something database like every second. That's just kind of crazy.
You definitely don't want to do that. So what you do, you sort of, you just try to reconcile the single hash, which is at the root of the project. And then if something mismatches, then you go, you find where all the things disagree. Maybe you look at the children and see if the hashes match. And if the hashes don't match, go look at their children and so on.
You definitely don't want to do that. So what you do, you sort of, you just try to reconcile the single hash, which is at the root of the project. And then if something mismatches, then you go, you find where all the things disagree. Maybe you look at the children and see if the hashes match. And if the hashes don't match, go look at their children and so on.
You definitely don't want to do that. So what you do, you sort of, you just try to reconcile the single hash, which is at the root of the project. And then if something mismatches, then you go, you find where all the things disagree. Maybe you look at the children and see if the hashes match. And if the hashes don't match, go look at their children and so on.
But you only do that in the scenario where things don't match. And for most people, most of the time, the hashes match.