Assaf Resnick
๐ค SpeakerAppearances Over Time
Podcast Appearances
is a potential problem that means my big customer who's got serious SLAs is not getting, uh, the quality of service that they expect.
That's a really big problem.
And, uh,
The amount of stuff that those engineers have to look at has become a big data issue.
It's both.
It's very much both.
So right now, the way it is, is imagine you're in that war room.
You've got 70,000 different alerts.
And 1% of those alerts are really important.
And the other 99% are just noise.
Go filter through that at any reasonable time.
So what we do is we use a lot of machine learning and dynamic clustering to say, hey, Mr. or Mrs. IT operations, you don't really have 70,000 alerts.
You have just a handful of core issues with lots of symptoms, but you don't have to be a data scientist to figure that out.
Okay.
So...
I've got an alert that says that server number 123 has used up the majority of its memory, or it's got low disk IO, or something that's a problem.
Now, is that a problem?
Well, it's a problem if lots of other servers in that cluster are also experiencing problem.
And then it's only a problem if that cluster services some kind of critical application.
And then it's really only a problem if a customer who relies on that application is having a crappy user experience.