Anomaly detection, also called outlier analysis, refers to uncovering and analyzing anything in a dataset that deviates from the norm. Anything that falls outside of “business as usual” is anomalous data, or an outlier, and these rare events may indicate deviations from the norm that are important to fully understand in order to improve or protect your business from serious issues such as machine failure, fraud, or hacking.
Anomaly detection is an important application of data science, and one that can uncover both mistakes and opportunities. A data point in a dataset that skews results may indicate a technical hiccup, for example, while an outlier in another set of data may indicate a change in buyer behavior that a company can respond to. Continue reading to learn more about anomaly detection in AI.
Anomalies are neither intrinsically good nor bad. Everything depends on context. Data outliers simply indicate a deviation from business as normal. Data anomalies merely indicate a break in the pattern that’s expected from the data for a particular metric or key performance indicator.
Anomaly detection in AI projects uncovers unexpected deviations in datasets. For example, online retailers always see a spike in sales on Cyber Monday. Sales volumes on this day of the year usually make it the biggest ecommerce sales day of the year. But this is not an anomaly. That is the pattern. An anomaly would be if an online retailer didn’t see a spike in sales on Cyber Monday. That would fall outside the norm.
The goal of anomaly detection with big data and AI projects isn’t data—it’s decisions. Your aim is not to simply detect exceptions and peculiarities in your datasets. Your goal is to gain insights that lead to decisions that drive value.
For example, a manufacturer applies anomaly detection to a dataset and discovers that a piece of equipment is likely to fail prematurely. The manufacturer replaces the equipment before it causes a catastrophic unplanned shutdown in production. Or a bank applies anomaly detection to a dataset and notices an unexpected increase in fraudulent transactions that indicates an insider threat. The bank takes steps to fire the employee and charge them with theft.
When training an anomaly detection algorithm, it is critical to include a human-in-the-loop component, where people familiar with the data review the anomalies flagged by the model to evaluate its effectiveness.
By consistently performing these kinds of reviews prior to deploying models into production, we can uncover potential bias in AI models. Bias creeps into data, AI models, and outcomes. Pre-existing biases and lack of representation mean that AI can benefit or harm one group over another.
When you take an IQ test, you are often required to complete a visual patterns test. You are shown a sequence of geometric objects, pictures or shapes. These are created based on some rules. They follow a pattern. But one of the items is wrong.
If you can rapidly spot the pattern every time, and always identify the outlier, you are a genius. If you can do this same exercise in seconds with hundreds of thousands of images, day in, day out, you are a trusted AI model—one that uses anomaly detection to deliver trusted results.
Stay up-to-date on the latest in trusted AI and data science by subscribing to our Voices of Trusted AI monthly digest. It’s a once-per-month email that contains helpful trusted AI resources, reputable information and actionable tips straight from data science experts themselves.
Bob Wood is a Data Scientist at Pandata.