Anomaly detection hunts for unexpected patterns in data, known as contaminants, peculiarities, surprises, aberrations, exceptions, discordant observations, outliers, or anomalies in various application domains. This article deliberates on anomaly detection and the significance of artificial intelligence (AI)-based methods in this field.
Anomaly Detection Basics
Anomaly detection is a critical data analysis task that identifies anomalous/abnormal data from a dataset. Anomalies in data indicate rare and significant events and prompt critical actions in several application domains. For instance, an unusual traffic pattern in a network indicates computer hacking and data transmission to unauthorized destinations.
Anomalies are characterized by point anomaly, contextual anomaly, and collective anomaly. Point anomaly is the simplest anomaly type and refers to the deviation of a specific data instance from the normal/usual pattern of the dataset. An anomalous behavior of a data instance in a particular context is called contextual/conditional anomaly.
Collective anomaly is identified by the anomalous behavior of a collection of similar data instances in a dataset. Scoring-based anomaly detection techniques represent anomalies by assigning an anomaly score to every data instance. Then, the scores are ranked and anomalies are selected or a threshold is used to select anomalies by an analyst. Binary/label anomaly detection techniques classify outputs in a binary manner, which is either normal or anomalous.
Anomaly detection is employed in many application domains, such as astronomical data, robot behavior, sensor networks, image processing, industrial damage, intrusion detection, fraud detection, and public health. Intrusion detection hunts malicious activity in computer systems, such as penetrations and break-ins. In this form of anomaly detection, the massive volume of data is the biggest challenge.
The anomaly detection techniques must be computationally efficient for effectively handling these large inputs. Additionally, online analysis is necessary as the data comes in a streaming fashion. Fraud detection involves the detection of criminal activities in commercial organizations, such as the stock market, banks, cell phone companies, credit card companies, and insurance agencies. The malicious users can either be actual customers of the organization or be posing as customers.
In the medical and public health domains, anomaly detection involves detecting anomalies in patient records due to recording errors, instrumentation errors, or abnormal patient conditions. Misclassifying anomalies as normal poses the greatest challenge in this domain's anomaly detection.
Classification-based Network Anomaly Detection
Classification-based techniques depend on the extensive knowledge of experts about network attack characteristics. A network expert provides the attack characteristics details to the detection system, based on which an attack with a known pattern is detected whenever it is launched.
Classification-based techniques, specifically multi-class techniques, can leverage robust algorithms that can differentiate between instances belonging to various classes. Additionally, the classification-based techniques’ testing phase is fast as every test instance is compared against the pre-computed model.
The classification-based techniques rely on the usual traffic activity profile that builds the knowledge base and classifies activities deviating from the baseline profile as anomalous. Key classification-based network anomaly detection techniques are support vector machine (SVM), Bayesian network, neural network, and rule-based methods.
SVM-based Approaches: The basic SVM principle is to derive a hyperplane that maximizes the separating margin between the negative and positive classes. Although the standard SVM is a supervised learning technique, it can be adapted as an unsupervised learning technique.
In a study, the unsupervised SVM concept was used for anomalous event detection. The algorithm identifies hyperplanes separating the data instances from their origins with the maximal margin, and then the best hyperplane is determined by solving an optimization problem. A variant of the sequential minimal optimization algorithm can solve the optimization problem.
Another new approach, designated as registry anomaly detection (RAD), was developed using the one-class SVM concept in a supervised manner for Windows registry query monitoring. The one-class SVM was applied to the RAD system for anomaly detection in the Windows registry. An anomaly detection method was developed using the robust SVM that ignores the noisy data. The noise in training data invalidates the primary SVM assumption that training sample data are distributed identically and independently.
Thus, standard SVM's highly non-linear decision boundary hinders generalization. The robust SVM incorporates the averaging technique as a class center to automatically ensure a smoother decision surface and automatically control regularization. Additionally, the quantity/number of support vectors in the RSVM is much less compared to the standard SVM, resulting in a reduced run time.
Neural Network-based Approaches: The strength of a neural network in data classification has also been leveraged to detect network anomalies. Neural networks are merged with other techniques, such as statistical approach and its variants, for network anomaly detection.
For instance, a replicator neural network (RNN) can provide an outlyingness factor for anomalous network traffic. This is a feed-forward multi-layer perceptron with three hidden layers between the output and input layers and reproduces the input data pattern at the output layer with minimal error through training.
Rule-based Approaches: Rule-based techniques reign in supervised learning algorithms, mastering both one-class and multi-class settings. These anomaly detection techniques learn rules that capture a system’s normal behavior. A test instance that is not covered by such rules is designated as an anomaly. A rule-based method was proposed in a study for IEC 60870-5-104 driven SCADA networks using a deep packet inspection method and an in-depth protocol analysis.
Deep Anomaly Detection (DAD) Models
Supervised DAD: Supervised deep learning-based classification schemes for anomaly detection possess two sub-networks, including a feature extraction network and a classifier network. Deep models require vast training samples to learn feature representations to distinguish different class instances effectively.
The computational complexity of the supervised DAD methods is based on the number of hidden layers trained using the back-propagation algorithm and the input data dimension. Supervised DAD outperforms unsupervised and semi-supervised in accuracy.
Semi-supervised DAD: Semi-supervised/one-class classification DAD techniques assume that training instances possess only one class label. Different semi-supervised DAD model architectures/models used in anomaly detection include autoencoders (AE), restricted Boltzmann machine (RBM), deep belief networks (DBNs), corrupted generative adversarial networks (CorGAN), GAN, adversarial autoencoders (AAE), denoising autoencoders (DAE)-k-nearest neighbors (KNN) hybrid model, DBN-random forest (RF) hybrid model, convolution neural networks (CNNs), CNN-SVM hybrid model, and recurrent neural network (RNN).
GANs trained in semi-supervised learning mode using very few labeled data display great performance. Moreover, using one class of labeled data leads to a considerable performance improvement in semi-supervised DAD models over unsupervised techniques.
Hybrid DAD: In these models, the representative features learned within deep models are fed to traditional algorithms, like SVM and radial basis function (RBF). Deep hybrid models employ two-step learning and demonstrate state-of-the-art performance.
AE-one class SVM, AE-SVM, DBN-support vector data description (SVDD), AE-SVDD, deep neural network (DNN)-SVM, AE-CNN, AE-DBN, AE-KNN, CNN-long short-term memory network (LSTM)-SVM, and convolutional autoencoders (CAE)-one class SVM are the leading deep hybrid architectures used in anomaly detection.
A hybrid model’s computational complexity includes the complexity of both deep architectures and the traditional algorithms utilized within them. In deep hybrid models, the feature extractor substantially lessens the “curse of dimensionality’, specifically in the high dimensional domain. Additionally, hybrid models are more computationally efficient and scalable than nonlinear or linear kernel models operating on decreased input dimensions.
Challenges in Anomaly Detection Techniques
Classification-based Anomaly Detection Techniques: Multi-class classification techniques depend on the accurate label availability for different normal classes, which is a major challenge. Moreover, classification-based techniques assign a label to every test instance, which is disadvantageous when a meaningful anomaly score is required for test instances.
Supervised and Hybrid DAD: Deep supervised techniques cannot effectively separate normal data from anomalous data when the feature space is non-linear and highly complex. Similarly, the hybrid approach is suboptimal as it cannot influence the representational learning within feature extractor hidden layers due to the application of generic loss functions instead of customized objectives for anomaly detection.
In conclusion, AI has increased the effectiveness and accuracy of anomaly detection approaches. However, existing anomaly detection methods need further research to overcome limitations.
References and Further Reading
Ahmed, M., Mahmood, A. N., Hu, J. (2016). A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60, 19–31. https://doi.org/10.1016/j.jnca.2015.11.016
Chalapathy, R., Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. https://www.researchgate.net/publication/330357393_Deep_Learning_for_Anomaly_Detection_A_Survey
Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58. https://doi.org/10.1145/1541880.1541882