Clustering overview
Clustering is an unsupervised machine learning technique you can use to group similar records together. It is a useful approach for when you want to understand what groups or clusters you have in your data, but don't have labeled data to train a model on. For example, if you had unlabeled data about subway ticket purchases, you could cluster that data by ticket purchase time to better understand what time periods have the heaviest subway usage. For more information, see What is clustering?
K-means models
are widely used to perform clustering. You can use k-means models with the
ML.PREDICT
function
to cluster data, or with the
ML.DETECT_ANOMALIES
function
to perform anomaly detection.
K-means models use
centroid-based clustering to organize data into clusters.
To get information about a k-mean model's centroids, you can use the
ML.CENTROIDS
function.
Recommended knowledge
By using the default settings in the CREATE MODEL
statements and the
inference functions, you can create and use a clustering model even
without much ML knowledge. However, having basic knowledge about
ML development, and clustering models in particular,
helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes: