Privacy metric to compute for reidentification risk analysis. .. attribute:: type
Types of analysis.
Categorical stats
l-diversity
delta-presence
Classes
CategoricalStatsConfig
Compute numerical stats over an individual column, including number of distinct values and value count distribution. .. attribute:: field
Field to compute categorical stats on. All column types are supported except for arrays and structs. However, it may be more informative to use NumericalStats when the field type is supported, depending on the data.
DeltaPresenceEstimationConfig
δ-presence metric, used to estimate how likely it is for an attacker to figure out that one given individual appears in a de-identified dataset. Similarly to the k-map metric, we cannot compute δ-presence exactly without knowing the attack dataset, so we use a statistical model instead. .. attribute:: quasi_ids
Required. Fields considered to be quasi-identifiers. No two fields can have the same tag.
Several auxiliary tables can be used in the analysis. Each custom_tag used to tag a quasi-identifiers field must appear in exactly one field of one auxiliary table.
KAnonymityConfig
k-anonymity metric, used for analysis of reidentification risk. .. attribute:: quasi_ids
Set of fields to compute k-anonymity over. When multiple fields are specified, they are considered a single composite key. Structs and repeated data types are not supported; however, nested fields are supported so long as they are not structs themselves or nested within a repeated field.
KMapEstimationConfig
Reidentifiability metric. This corresponds to a risk model similar to what is called “journalist risk” in the literature, except the attack dataset is statistically modeled instead of being perfectly known. This can be done using publicly available data (like the US Census), or using a custom statistical model (indicated as one or several BigQuery tables), or by extrapolating from the distribution of values in the input dataset. .. attribute:: quasi_ids
Required. Fields considered to be quasi-identifiers. No two columns can have the same tag.
Several auxiliary tables can be used in the analysis. Each custom_tag used to tag a quasi-identifiers column must appear in exactly one column of one auxiliary table.
LDiversityConfig
l-diversity metric, used for analysis of reidentification risk. .. attribute:: quasi_ids
Set of quasi-identifiers indicating how equivalence classes are defined for the l-diversity computation. When multiple fields are specified, they are considered a single composite key.
NumericalStatsConfig
Compute numerical stats over an individual column, including min, max, and quantiles. .. attribute:: field
Field to compute numerical stats on. Supported types are integer, float, date, datetime, timestamp, time.