Predictive AI for Reliability and Security

Citation

In the computer systems field, we use AI, or its more trendy cousin, Machine Learning for predictive reliability and security all the time. This means we take action in anticipation of failure of a computer system to make sure it continues to function reliably and securely. The question is does this translate beyond the cyber world to the human world. Should we try to develop and apply similar predictive principles to the two classes of worlds?

One common example of the use in the digital world is for a machine learning algorithm to predict if a machine or a network router is going to fail, based on crunching through lots of data. Our digital worlds are chock-full of monitors and sensors, which provide all different kinds of measurements at fine time scales. For example, did a server in the data center, or a rack of servers, heat up this one tiny bit? Does a network router in the data center have a queue of packets that is increasing in length for the past second? There are lots of open source (and free) tools for monitoring [WWW1, WWW2, WWW3] as well as commercial tools from companies big and small.

Automation of the management of our digital assets has been increasing for years now. It is at a point now where many decisions to react to anomalies that the monitors find are taken automatically. For example, if the rack of servers is heating up, let us turn up the knob on our HVAC to direct its cooling power to the affected rack. If the load on a web server has gone up, let us spin up another copy of the server and direct a fraction of the requests to it.

This topic has been an active area of research in the computer systems community and is seeing increasing deployment in places like data centers and others. The general workflow of the solutions is:

Collect lots of metrics continuously.
Analyze the metrics for signs of impending failures. This is where ML algorithms come into play.
If such a sign shows up, analyze the metrics to determine which components may be the culprits. Again ML algorithms weave some of their magic.
Reconfigure/reload/replace the offending components.

Some specific kinds of problems where this line of work has been successful are in: load balancing issues (load spikes, detect, and spin up new replicas to handle the increased load; correspondingly, decrease the replicas when load comes down), resource leak problems (e.g., the program keeps allocating memory and does not release it, so down the road the machine is going to run out of memory and all hell will break loose; so force the program to shed unused memory), and deadlock kinds of failures (in a distributed system, some process P1 wants some resource R1, which unfortunately another process P2 is currently hogging. And P2 cannot release R1 because it simultaneously needs R2 to get its job done and as luck would have it P1 is holding on to R2.)

Now take a wild leap and apply the predictive principle to the human world, again to increase the reliability or security of “systems”. Consider two scenarios.

First, chatter on the social media has picked up some troubling signs where some religious group or ethnic group is going to be targeted. There are posts, pictures, recipes for criminal activity, the works. It is our magical ML algorithms that have crunched through all the various types of data and come up with a disconcertingly high number that quantifies the likelihood that in the next few days, there will be some act of violence against this group. So what does law enforcement do?

Scenario 1: It increases the police presence in the area where the members of the group are concentrated and around the house of worship of that group. The increased presence is noticeable and in the good news version of the story, this dissuades the crime. The ML algorithms again pick up the chatter and the quantitative likelihood of the violent incident reassuringly drops.

Scenario 2: The results from the ML algorithms are queried further, through a mix of human and automated (again some magical number crunching ML algorithm) means, to determine a set of likely perpetrators of the anticipated crime. They are then rounded up and taken into custody. The ML algorithms now crunch through the chatter and the quantitative likelihood of the violent act drops. But it also picks up through its sentient sentiment analysis algorithm an upsurge in the resentment in the community where the people have been picked up from.

Which of the scenarios (if any) would you like to happen? And which of these happens today?

Only you know the answer to the first question.

The answer to the second question is that scenario 1 is quite routine in the US today. One dominant software, PredPol, is used today by more than 60 police departments around the country. It identifies areas in a neighborhood where serious crimes are more likely to occur during a particular period. The US influence is spreading beyond our borders too. For example, Yokohama in Japan is spending top research dollars to build predictive policing algorithms, hoping to put it in place by the 2020 Olympic Games. Scenario 2 was anticipated in fiction, way back in a 1956 short story by Phillip Dick. You may not have read the book; you have likely seen the movie based on it, “Minority Report”. But this scenario has not made its way into reality yet and mercifully so.

You can guess that possibly a similar workflow as in the digital world is being applied to predictive law enforcement. With the big difference that people’s sense of well-being, or worse, liberty, may be affected by the results of the algorithm. Even without the dystopian scenario 2 being a reality, can increased police presence lead to more arrests. I believe rational well-meaning people can disagree on the point of whether scenario 1 is desirable or not. On the plus, it can prevent a group, or an individual, from becoming the victim of a violent act and without fingering anyone simply due to the verdict of an ML algorithm.

To me a cause for concern lies elsewhere. We have to guess what is going on in these algorithms because they are not open source. You or I or experts in ML cannot look at the software, run it through its paces, try different kinds of failure analysis to see where it breaks. And that is a crucial shortcoming. In the reliability and the security world, it is a near unanimous verdict of the technical community that open source software is the way to go [WWW1, WWW2] (though there are some thoughtful critics too). Even stodgy federal government came out with a directive in 2016 that requires government agencies to release 20 percent of any new code they commission as open source software. More eyeballs looking at the code, evaluating it, brings out its defects faster and leads to improvements faster. This same principle is lacking in the newly emerging domain of ML algorithms being used in predictive mode for controlling the human world.

So let us not try to stop the progress of predictive ML algorithms (a complete lost cause in my mind) but rather to regulate how they can be used in the human world. Technologists often have an instinctive aversion to regulation, but regulation has its place in setting down incentives and deterrents. For example, we can mine genetic data to determine our predisposition for various diseases—the very business case for 23andme and other such companies. But due to an act (called the Genetic Information Nondiscrimination Act or GINA) passed in 2008 in the early days of the field, employers or insurance providers cannot use this data.

It is very much within the realm of technical possibility to use ML algorithms make aggregate decisions but not “invert the function” to identify individuals (as potential offenders). So throw in a large bunch of features (tweets, social media posts, posted pictures, etc. each by a large number of individuals) and the ML algorithm comes up with an aggregate predicted number (say, for some criminal activity). This today’s algorithms do very well, but inverting the model to identify which features had how much effect on the final answer, this is still a research challenge. This technological gap helps to prevent the dystopian scenario 2 today. But the technological gap will be bridged soon and then we will need to provide clear regulation.

Looking forward to the world where a lot more things will be technologically possible than are societally desirable

In the end, security in the physical world will be enhanced through technological progress, including predictive ML algorithms. But we have to tread carefully to determine at what level such algorithms should be applied.

Predictive AI for Reliability and Security

Predictive AI in the Cyber World

Predictive AI in the Human World

The Onward March

In Conclusion

Leave a comment Cancel reply