Q & A: Perspectives on AI in the data center, by Ajay Gulati

How can AI be used to automate and streamline data centers?
AI can be used to predict and automate several challenging tasks in a datacenter, such as:

  1. Finding anomalies in the behavior of devices and monitoring stats. For example, AI can be used to indicate a security breach by figuring out the typical rate of network data transmit and then raising an issue if that changes dramatically.
  2. Detecting when a device might fail before it actually happens.
  3. Predicting load spikes and automatically scaling out services before they happen.

What are the potential benefits that can be obtained with AI data center automation?
AI techniques can significantly reduce problem resolution time, because they can monitor for and identify issues before they get worse and have a broader impact. AI can also help human operators do their jobs more effectively. By helping to resolve problems quickly, AI can help avoid long downtimes for applications, and it can lessen the impact of security breaches by detecting attacks.

What types of data centers most stand to benefit from AI?
Virtualized or cloud enabled data centers are most likely to benefit from AI. These data centers have high degrees of consolidation and utilization. Also, in these data centers, resources are

pooled together to create automated, API-driven infrastructure. The rate of change in such data centers is much higher and the environment is highly dynamic. This makes these data centers most suitable for AI-based automation and problem-solving.

What’s the best way to begin using data center AI automation?
AI must be implemented gradually to build trust. IT administrators can start with a tool or solution that uses AI but gives the final conclusions or recommendation to a human. A human can then apply those recommendations after verifying that they make sense. Another way to  start is to use AI to analyze data and give meaningful insights. Right now, most tools show a lot of data, but the analysis is left to infrastructure admins and operators.

What are the pitfalls to look out for?
Don’t trust any AI-based system blindly: these systems can also learn the wrong things based on bad data. It’s important to wait to get the system trained on enough data samples and to track a metric to measure the accuracy of predictions or false positives/negatives.

What’s the long-term outlook for AI-based data center automation?
AI is going to be more and more prevalent in data centers. We are using AI in so many other domains like transportation, finance, medical science, so it is only natural that we use it for data centers and machines that are used to do AI themselves. This is a simple recursion, which is well-known in computer science.

Final thoughts
The main criteria for developing AI-based solutions is to collect meaningful data in one place. Currently the data is distributed across customers and device vendors, which makes it harder to develop effective AI-based solution due to lack of data. Some startups are going in the direction of aggregating data from a large number of clouds or data centers in one place, so that they can build good AI models on top. At ZeroStack, we are on a similar mission and have already been collecting data for more than a year. This is one of the key requirements for success in this area. Beyond collecting data, it will be good if there are techniques to anonymize and share stats, metrics and other information across public and private data center deployments.

Leave a Reply