Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Justin Garrison

👤 Person
452 total appearances

Appearances Over Time

Podcast Appearances

And you have a resource allocator that does that, as far as I understood. How does that affect what you're doing? You have a set of host profiles. You say, hey, you can pick from a menu. And then we know how to switch between them. How does that typically work?

And you have a resource allocator that does that, as far as I understood. How does that affect what you're doing? You have a set of host profiles. You say, hey, you can pick from a menu. And then we know how to switch between them. How does that typically work?

How does that affect you as the OS team? Like, is there anything that you're doing specifically for that?

How does that affect you as the OS team? Like, is there anything that you're doing specifically for that?

Yeah, they had them at scale and I was very jealous because they're cool. And this is an audio podcast, so no one knows what we're talking about. But basically, it's a bunch of little small tuxes inside the hood of the hoodie.

Yeah, they had them at scale and I was very jealous because they're cool. And this is an audio podcast, so no one knows what we're talking about. But basically, it's a bunch of little small tuxes inside the hood of the hoodie.

That's something you have to wait for it though, right? Like you're like, we're going to write this internally. We're going to hope this gets upstreamed. And then we have to either wait for the release to consume it, or we're just going to keep running it. But then if upstream needs changes, you have to kind of like merge back to it.

That's something you have to wait for it though, right? Like you're like, we're going to write this internally. We're going to hope this gets upstreamed. And then we have to either wait for the release to consume it, or we're just going to keep running it. But then if upstream needs changes, you have to kind of like merge back to it.

How does release frequently and a million hosts go together? Because you mentioned that it takes about a year to basically roll out an update to every host. But if you're pushing out updates to the OS every month, then you have 12 different stages of things that are going through release. And that makes it really hard to debug and predict, oh, what version are you on?

How does release frequently and a million hosts go together? Because you mentioned that it takes about a year to basically roll out an update to every host. But if you're pushing out updates to the OS every month, then you have 12 different stages of things that are going through release. And that makes it really hard to debug and predict, oh, what version are you on?

Did we fix that bug somewhere else? How do you manage that?

Did we fix that bug somewhere else? How do you manage that?

You mentioned an AI fleet. From what I've heard, Zuckerberg talk about is like, Meta has more GPUs than anyone else in the world, basically. How do you manage that? Not only are how the drivers installed, because Linux and NVIDIA aren't always known to be the best friends, but then how do you isolate those things and roll out those changes?

You mentioned an AI fleet. From what I've heard, Zuckerberg talk about is like, Meta has more GPUs than anyone else in the world, basically. How do you manage that? Not only are how the drivers installed, because Linux and NVIDIA aren't always known to be the best friends, but then how do you isolate those things and roll out those changes?

Under TW Shared, do they just show up as a host profile? Or is that like, do I get an entitlement that says I need GPUs for this type of workload?

Under TW Shared, do they just show up as a host profile? Or is that like, do I get an entitlement that says I need GPUs for this type of workload?

Okay, that's interesting. One thing I found fascinating about some of the talks you've given and information is the fact that Meta is still notably an on-prem company. You have your own data centers, you have your own regions, you have machines, and it doesn't seem like you try to hide that from people. You don't try to abstract it away.

Okay, that's interesting. One thing I found fascinating about some of the talks you've given and information is the fact that Meta is still notably an on-prem company. You have your own data centers, you have your own regions, you have machines, and it doesn't seem like you try to hide that from people. You don't try to abstract it away.