Cloud Data Loss Prevention (Cloud DLP) is now a part of Sensitive Data Protection. The API name remains the same: Cloud Data Loss Prevention API (DLP API). For information about the services that make up Sensitive Data Protection, see Sensitive Data Protection overview.
Stay organized with collections
Save and categorize content based on your preferences.
Re-identification risk analysis, or just risk analysis, is the process of
analyzing sensitive data to find properties that might increase the risk of
subjects being identified. You can use risk analysis methods before
de-identification to help determine an effective de-identification strategy or
after de-identification to monitor for any changes or outliers.
Sensitive Data Protection can compute four re-identification risk metrics: k-anonymity,
l-diversity, k-map, and δ-presence. If you're not familiar with risk
analysis or these metrics, see the risk analysis concept
topic before continuing on.
This section provides overviews of how to use Sensitive Data Protection for risk
analysis of structured data using any of these metrics, plus other related
topics.
Calculate re-identification risk
Sensitive Data Protection can analyze your structured data stored in
BigQuery tables and compute the following re-identification risk
metrics. Click the link for the metric you want to calculate to learn more.
A property of a dataset that indicates the re-identifiability of its records. A dataset is k-anonymous if quasi-identifiers for each person in the dataset are identical to at least k – 1 other people also in the dataset.
An extension of k-anonymity that additionally measures the diversity of sensitive values for each column in which they occur. A dataset has l-diversity if, for every set of rows with identical quasi-identifiers, there are at least l distinct values for each sensitive attribute.
Estimates the probability that a given user in a larger population is present in the dataset. This is used when membership in the dataset is itself sensitive information.
Calculate other statistics
Sensitive Data Protection can also compute numerical and categorical
statistics for data stored in BigQuery tables using the same
DlpJob resource as the
risk analysis APIs.
You can visualize the risk metrics that Sensitive Data Protection calculates
directly in the Google Cloud console using Sensitive Data Protection
(k-anonymity or
l-diversity), or using other
Google Cloud products.
After calculating k-anonymity values for a dataset using Sensitive Data Protection, you can visualize the results in Looker Studio. By doing so, you'll also be able to better understand re-identification risk and help evaluate the trade-offs in utility you might be making if you redact or de-identify data.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Measuring re-identification and disclosure risk\n\n*Re-identification risk analysis* , or just *risk analysis*, is the process of\nanalyzing sensitive data to find properties that might increase the risk of\nsubjects being identified. You can use risk analysis methods before\nde-identification to help determine an effective de-identification strategy or\nafter de-identification to monitor for any changes or outliers.\n\nSensitive Data Protection can compute four re-identification risk metrics: *k* -anonymity,\n*l* -diversity, *k* -map, and *δ* -presence. If you're not familiar with risk\nanalysis or these metrics, see the [risk analysis concept\ntopic](/sensitive-data-protection/docs/concepts-risk-analysis) before continuing on.\n\nThis section provides overviews of how to use Sensitive Data Protection for risk\nanalysis of structured data using any of these metrics, plus other related\ntopics.\n\nCalculate re-identification risk\n--------------------------------\n\nSensitive Data Protection can analyze your structured data stored in\nBigQuery tables and compute the following re-identification risk\nmetrics. Click the link for the metric you want to calculate to learn more.\n\nCalculate other statistics\n--------------------------\n\nSensitive Data Protection can also compute numerical and categorical\nstatistics for data stored in BigQuery tables using the same\n[`DlpJob`](/sensitive-data-protection/docs/reference/rest/v2/projects.dlpJobs) resource as the\nrisk analysis APIs.\n\nFor more information, see\n[Computing numerical and categorical statistics](/sensitive-data-protection/docs/compute-stats).\n\nVisualize re-identification risk\n--------------------------------\n\nYou can visualize the risk metrics that Sensitive Data Protection calculates\ndirectly in the Google Cloud console using Sensitive Data Protection\n([*k*-anonymity](/sensitive-data-protection/docs/compute-k-anonymity#viewing-results) or\n[*l*-diversity](/sensitive-data-protection/docs/compute-l-diversity#viewing-results)), or using other\nGoogle Cloud products."]]