Kyle d'Oliveira
👤 PersonAppearances Over Time
Podcast Appearances
So if you think about Rails migrations, if you think about how they're often applied, if you think about some examples that you've worked on, there are often times where you use something like a tool like Capistrano, which deploys some code, and as part of the code, database migration gets run. And for Projects, That's fine.
So if you think about Rails migrations, if you think about how they're often applied, if you think about some examples that you've worked on, there are often times where you use something like a tool like Capistrano, which deploys some code, and as part of the code, database migration gets run. And for Projects, That's fine.
That's like a for most small things like that migration that runs is fast and it's not a problem. But so this is an example of a decision that we kind of were like, let's just inherit what the community uses. But as we started scaling out, we started encountering problems with it. So, for instance, a table that if you ran a migration on it took 30 minutes.
That's like a for most small things like that migration that runs is fast and it's not a problem. But so this is an example of a decision that we kind of were like, let's just inherit what the community uses. But as we started scaling out, we started encountering problems with it. So, for instance, a table that if you ran a migration on it took 30 minutes.
This means that our deployment took 30 minutes and also timed out. So we lost all of the context of it. But also during this period of time, the table locked. So any developer or any queries that started going to that table stopped being answered. So all of our servers shut down. And we couldn't kill the alter table because it was already mid-progress. And...
This means that our deployment took 30 minutes and also timed out. So we lost all of the context of it. But also during this period of time, the table locked. So any developer or any queries that started going to that table stopped being answered. So all of our servers shut down. And we couldn't kill the alter table because it was already mid-progress. And...
After it finished, we now had a table with a new state, but the code hadn't actually finished deploying. So now we're running into different problems. So this is a little bit of a decision that makes a lot of sense when you're small. Go really quick because you can, and it makes sense. But when you hit a certain piece of scale...
After it finished, we now had a table with a new state, but the code hadn't actually finished deploying. So now we're running into different problems. So this is a little bit of a decision that makes a lot of sense when you're small. Go really quick because you can, and it makes sense. But when you hit a certain piece of scale...
you can no longer run with those assumptions and you need to change those. So a new process needs to be built. And for database migrations, we need to build them in a way that are like entirely asynchronous to a deployment process.
you can no longer run with those assumptions and you need to change those. So a new process needs to be built. And for database migrations, we need to build them in a way that are like entirely asynchronous to a deployment process.
Yeah, I think this table that we run uses a little bit of all of the activity that users do. And it was like the first table we ran into that it exhausted like 32-bit integers and we needed to flip the IDs to be big ins. We didn't think that would be a problem either. And it's leaps and bounds bigger than any of the other tables we have in our system.
Yeah, I think this table that we run uses a little bit of all of the activity that users do. And it was like the first table we ran into that it exhausted like 32-bit integers and we needed to flip the IDs to be big ins. We didn't think that would be a problem either. And it's leaps and bounds bigger than any of the other tables we have in our system.
There's actually this good question, and there's actually a lot of tools that exist that we don't necessarily need to build ourselves. GitHub has a tool called Ghost. There's another tool by Percona. It's in the Percona toolkit. I can't remember. It's like maybe online schema replacement. I can't remember the exact name.
There's actually this good question, and there's actually a lot of tools that exist that we don't necessarily need to build ourselves. GitHub has a tool called Ghost. There's another tool by Percona. It's in the Percona toolkit. I can't remember. It's like maybe online schema replacement. I can't remember the exact name.
The general strategy is to, instead of changing a table with like an alter table, you actually create a brand new table, populate that table with various mechanisms. Some of them use triggers, some of them use the binary logs, get the table to like a table that's in sync, and then do quick renames.
The general strategy is to, instead of changing a table with like an alter table, you actually create a brand new table, populate that table with various mechanisms. Some of them use triggers, some of them use the binary logs, get the table to like a table that's in sync, and then do quick renames.
And so you rename the table to be the old one to be old, you change the new table to be the new one, and then new queries start flowing into this new table. And you can do this what As long as you want, it's entirely non-blocking, but it has to be in a process that exists entirely outside of the deployment stack.
And so you rename the table to be the old one to be old, you change the new table to be the new one, and then new queries start flowing into this new table. And you can do this what As long as you want, it's entirely non-blocking, but it has to be in a process that exists entirely outside of the deployment stack.
Some of them seem obvious. Like I think if you're dropping a column or adding a column that could potentially lock, but some of them are not. Like if you changed a Varchar from like a Varchar 100 to Varchar 200 and you're just increasing it, does that lock? Maybe. I actually don't know off the top of my head. What if you change the character set? What if you change the coalition? I don't know.
Some of them seem obvious. Like I think if you're dropping a column or adding a column that could potentially lock, but some of them are not. Like if you changed a Varchar from like a Varchar 100 to Varchar 200 and you're just increasing it, does that lock? Maybe. I actually don't know off the top of my head. What if you change the character set? What if you change the coalition? I don't know.