Justin Garrison
👤 PersonAppearances Over Time
Podcast Appearances
At least I haven't ever seen a reference to like, it's our internal cloud. No, it's like it's a pool of machines and people run stuff on the machines. And the...
At least I haven't ever seen a reference to like, it's our internal cloud. No, it's like it's a pool of machines and people run stuff on the machines. And the...
software and the applications running on top of it are very much like a this is this is just like system b unit you're just running it containerized what other types of services do you have internally that people need i mean i saw references to things like sharding for like hey we need just fast disk places and we need some storage and databases externally but like what are the pieces that you find that are like common infrastructure for people to use
software and the applications running on top of it are very much like a this is this is just like system b unit you're just running it containerized what other types of services do you have internally that people need i mean i saw references to things like sharding for like hey we need just fast disk places and we need some storage and databases externally but like what are the pieces that you find that are like common infrastructure for people to use
Right, how those shards, making sure enough of the data replicas are available.
Right, how those shards, making sure enough of the data replicas are available.
Typical DBAs. What would you say are some of the challenges you're facing right now on the OS team or just in general in the infrastructure?
Typical DBAs. What would you say are some of the challenges you're facing right now on the OS team or just in general in the infrastructure?
And you mentioned that as far as moving more things out of something, traditional configuration management like Chef and moving it into more of a host native binary that can manage things, I don't want to say more flexibly, but I guess more predictively. I think you mentioned that where it's just like, yeah.
And you mentioned that as far as moving more things out of something, traditional configuration management like Chef and moving it into more of a host native binary that can manage things, I don't want to say more flexibly, but I guess more predictively. I think you mentioned that where it's just like, yeah.
You also mentioned in the, again, referencing the paper, because I just recently read it. All of your hosts are the same size, right? It's all one CPU socket. And I think it was like 64 gigs of RAM or something like that.
You also mentioned in the, again, referencing the paper, because I just recently read it. All of your hosts are the same size, right? It's all one CPU socket. And I think it was like 64 gigs of RAM or something like that.
Okay. So the majority of TW shared is like, we have one size and you're just like, everyone fit into this one size and we will see how we can make that work. Right. Cause you, you can control the workloads or at least help them optimize in certain ways to say, cause like not all AI jobs or big data jobs are going to fit inside of that envelope.
Okay. So the majority of TW shared is like, we have one size and you're just like, everyone fit into this one size and we will see how we can make that work. Right. Cause you, you can control the workloads or at least help them optimize in certain ways to say, cause like not all AI jobs or big data jobs are going to fit inside of that envelope.
So yeah, it's more of like a, a bin packing approach and saying like, Hey, maybe we do have some large hosts for, especially for, I'm assuming for the jobs that do need like, Hey, I don't fit in 64 gigs of Ram and I don't, you know, local NVMe isn't fast enough for whatever reason, or it's going to cause the job to run longer.
So yeah, it's more of like a, a bin packing approach and saying like, Hey, maybe we do have some large hosts for, especially for, I'm assuming for the jobs that do need like, Hey, I don't fit in 64 gigs of Ram and I don't, you know, local NVMe isn't fast enough for whatever reason, or it's going to cause the job to run longer.
Yeah, I suppose once you have any sort of specialized compute or interface, whether that's network, some fabric adapters, you always have snowflakes in some way. It was like, hey, this is different than the general compute stuff.
Yeah, I suppose once you have any sort of specialized compute or interface, whether that's network, some fabric adapters, you always have snowflakes in some way. It was like, hey, this is different than the general compute stuff.
How has that affected your global optimization around things? So I know, again, the paper was old now. It's like 2020, I think, is when it was published, which is probably looking at 2019, 2018 data. But in general, it was something like 18% overall total cost optimization because of moving to single size hosts, because you're just like, hey, our power draw was less overall globally.
How has that affected your global optimization around things? So I know, again, the paper was old now. It's like 2020, I think, is when it was published, which is probably looking at 2019, 2018 data. But in general, it was something like 18% overall total cost optimization because of moving to single size hosts, because you're just like, hey, our power draw was less overall globally.