Anomaly detection overview

Anomaly detection is a data mining technique that you can use to identify data deviations in a given dataset. For example, if the return rate for a given product increases substantially from the baseline for that product, that might indicate a product defect or potential fraud. You can use anomaly detection to detect critical incidents, such as technical issues, or opportunities, such as changes in consumer behavior.

One challenge when you use anomaly detection is determining what counts as anomalous data. If you have labeled data that identifies anomalies, you can perform anomaly detection by using the ML.PREDICT function with one of the following supervised machine learning models:

If you aren't certain what counts as anomalous data, or you don't have labeled data to train a model on, you can use unsupervised machine learning to perform anomaly detection. Use the ML.DETECT_ANOMALIES function with one of the following models to detect anomalies in training data or new serving data:

Data type Model types What ML.DETECT_ANOMALIES does
Time series ARIMA_PLUS Detect the anomalies in the time series.
ARIMA_PLUS_XREG Detect the anomalies in the time series with external regressors.
Independent and identically distributed random variables (IID) K-means Detect anomalies based on the shortest distance among the normalized distances from the input data to each cluster centroid. For a definition of normalized distances, see the k-means model output for the ML.DETECT_ANOMALIES function..
Autoencoder Detect anomalies based on the reconstruction loss in terms of mean squared error. For more information, see ML.RECONSTRUCTION_LOSS. The ML.RECONSTRUCTION_LOSS function can retrieve all types of reconstruction loss.
PCA Detect anomalies based upon the reconstruction loss in terms of mean squared error.

By using the default settings in the CREATE MODEL statements and the inference functions, you can create and use an anomaly detection model even without much ML knowledge. However, having basic knowledge about ML development helps you optimize both your data and your model to deliver better results. We recommend using the following resources to develop familiarity with ML techniques and processes: