Dr. Aqib Rashid
๐ค SpeakerAppearances Over Time
Podcast Appearances
So then obviously you then have to try to figure out, okay, we've proved the science of this a little.
Now we have to try and prove this out as an end-to-end commercially viable product.
So commercially viable product means, okay, you can achieve
specific true positive rates or detection rates or specific false alarm or false positive rates to make this useful for customers, but equally aspects such as inference time or round trip time, the deployment requirements and all that kind of stuff suddenly come into play there.
So the early phases of this at the MVP stage, when this transitioned from being a research project effectively into a product that we sought to develop at the time, involved making some deliberate decisions around the types of model architectures, the types of files that we would support, aspects related to explainability of models, because this product is intended to work as well as
in a highly regulated mission-critical environment, but also maybe in an enterprise SaaS solution as well.
So you might not always have access to the internet, you might not always have the ability to call to home, so to speak,
you will have the requirement for transparency and auditability.
When you're doing this at scale, you need to have the efficiency of millisecond scale inference.
So making a decision on file in milliseconds.
So that tolerance for sitting around and waiting for a file to come back with its verdict just isn't there at this kind of scale.
So understanding all of these different aspects of what customers would want and what would uniquely position this in a certain period of time once we had developed this was paramount for us at that early stage.
And of course, the different file types themselves, they are equally
diverse, but also equally complex in their nature.
So PDFs are radically different to Microsoft Word documents, which are different to Microsoft Excel documents, which are different to images.
So we have to make certain decisions at certain points in this entire process to try and bring about a useful product on all fronts there.
The maturation or the development process is all about taking it from the hypothesis to production discipline.
So, of course, early on, it was all about ensuring and validating that the signal that we are training on there, which is that CDR telemetry that is
produced by GlassWool actually worked for the problem of detecting malware and predicting malware.
But then taking it into production and maturing this further and building this into a roadmap basically meant bringing in that ML DLC, so Machine Learning Development Lifecycle, and the SDLC, and all the engineering rigor that comes with it.