"Association rule mining is a rule-based approach to reveal interesting relationships between data points in large datasets. Unsupervised learning algorithms search for frequent if-then associationsāalso called rulesāto discover correlations and co-occurrences within the data and the different connections between data objects."[1]
"It is most commonly used to analyze retail baskets or transactional datasets to represent how often certain items are purchased together. These algorithms uncover customer purchasing patterns and previously hidden relationships between products that help inform recommendation engines or other cross-selling opportunities. "[1]
"In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data."[2]
Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few." [2]
"A suspicious event might indicate a network breach, fraud, crime, disease or faulty equipment. An unexpected opportunity could involve finding a store, product or salesperson that's performing much better than others and should be investigated for insight into improving the business."[3]
Anomaly detection algorithm techniques [4]
- Isolation forest
- Local outlier factor
- Robust covariance
- One-class support vector machine (SVM)
- One-class SVM with stochastic gradient descent (SGD)
Here are short descriptions for each anomaly detection algorithm technique:
- Isolation Forest
Isolation Forest is a technique that leverages the idea that anomalous data points tend to be isolated from normal patterns. It achieves this by recursively partitioning the data space until each point is uniquely isolated, with more partitions indicating a higher likelihood of being an outlier. - Local Outlier Factor (LOF)
Local Outlier Factor (LOF) is an algorithm that measures each data point's local reachability density to identify anomalies. It compares a point's local reachability density to its neighbors to determine if it is an outlier, making it suitable for handling high-dimensional datasets without requiring specific distribution or shape assumptions. - Robust Covariance
Robust Covariance, specifically implemented using Minimum Covariance Determinant (MCD), is a technique that excels in identifying clusters and detecting outliers in high-dimensional datasets. It calculates a compact region containing most data points and considers any point outside this boundary as anomalous, making it a reliable choice for datasets with varying distributions. - One-Class Support Vector Machine (SVM)
One-Class Support Vector Machine (SVM) is an algorithm designed for unlabeled datasets dominated by normal instances. It learns a hyperplane that effectively separates the majority class from the origin, and any data point falling outside this margin is considered an outlier. - One-Class SVM with Stochastic Gradient Descent (SGD)
One-Class SVM with Stochastic Gradient Descent (SGD) is an optimized version of the One-Class SVM algorithm. It uses SGD to streamline its performance, allowing for efficient processing of larger datasets while maintaining its ability to timely detect anomalies, making it a powerful tool in modern anomaly detection applications.
For example use cases of AI Anomaly Detection, see IBM use cases.
Sources:
[1] What is unsupervised learning?, Google Cloud
[2] Anomaly detection, Wikipedia
[3] Anomaly detection, techtarget.com
[4] 5 Anomaly detection algorithm techniques, builtin.com
[5] Isolation Forest: How do you detect anomalies in a dataset?, datascientest.com
[6] Anomaly Detection Using Isolation Forest in Python, paperspace.com
[7] Isolation Forestā: The Anomaly Detection Algorithm Any Data Scientist Should Know, towardsdatascience.com
[8] Isolation forest, Wikipedia
[9] Anomaly detection using Isolation Forest, geeksforgeeks.org