The ML.GET_INSIGHTS function

This document describes the ML.GET_INSIGHTS function, which you can use to retrieve information about changes to key metrics in your multi-dimensional data from a contribution analysis model. You can use a CREATE MODEL statement to create a contribution analysis model in BigQuery.

Syntax

ML.GET_INSIGHTS(
  MODEL `project_id.dataset.model_name`
)

Arguments

ML.GET_INSIGHTS takes the following arguments:

  • project_id: Your project ID.
  • dataset: The BigQuery dataset that contains the model.
  • model: The name of the contribution analysis model.

Output

Some of the ML.GET_INSIGHTS output columns contain metrics that compare the values for a given segment in either the test or control dataset against the values for the population, which is all segments in the same dataset. The metric values calculated for the entire population except for the given segment are referred to as ambient values.

Output for summable metric contribution analysis models

ML.GET_INSIGHTS returns the following output columns for contribution analysis models that use summable metrics, in addition to any input data columns specified in the query_statement of the contribution analysis model:

  • contributors: an ARRAY<STRING> value that contains the dimension values for a given segment. The other output metrics that are returned in the same row apply to the segment described by these dimensions.
  • metric_test: a NUMERIC value that contains the sum of the value of the metric column in the test dataset for the given segment. The metric column is specified in the CONTRIBUTION_METRIC option of the contribution analysis model.
  • metric_control: a NUMERIC value that contains the sum of the value of the metric column in the control dataset for the given segment. The metric column is specified in the CONTRIBUTION_METRIC option of the contribution analysis model.
  • difference: a NUMERIC value that contains the difference between the metric_test and metric_control values, calculated as metric_test - metric_control.
  • relative_difference: a NUMERIC value that contains the relative change in the segment value between the test and control datasets, calculated as difference / metric_control.
  • unexpected_difference: a NUMERIC value that contains the unexpected difference between the segment's actual metric_test value and the segment's expected metric_test value, which is determined by comparing the ratio of change for this segment against the ambient ratio of change. The unexpected_difference value is calculated as follows:

    1. Determine the metric_test value for all segments except the given segment, referred to here as ambient_test_change:

      ambient_test_change = sum(metric_test for the population) - metric_test

    2. Determine the metric_control value for all segments except the given segment, referred to here as ambient_control_change:

      ambient_control_change = sum(metric_control for the population) - metric_control

    3. Determine the ratio between the ambient_test_change and ambient_control_change values, referred to here as ambient_change_ratio:

      ambient_change_ratio = ambient_test_change / ambient_control_change

    4. Determine the expected metric_test value for the given segment, referred to here as expected_metric_test:

      expected_metric_test = metric_control * ambient_change_ratio

    5. Determine the unexpected_difference value:

      unexpected_difference = metric_test - expected_metric_test

  • relative_unexpected_difference: a NUMERIC value that contains the ratio between the unexpected_difference value and the expected_metric_test value, calculated as unexpected_difference / expected_metric_test. You can use the relative_unexpected_difference value to determine if the change to this segment is smaller than expected compared to the change in all of the other segments.

  • apriori_support: a NUMERIC value that contains the apriori support value for the segment. The apriori support value is either the ratio between the metric_test value for the segment and the metric_test value for the population, or the ratio between the metric_control value for the segment and the metric_control value for the population, whichever is greater. The calculation is expressed as GREATEST((metric_test/ sum(metric_test for the population),(metric_control/ sum(metric_control for the population)). If the apriori_support value is less the MIN_APRIORI_SUPPORT option value specified in the model, then the segment is considered too small to be of interest and is excluded by the model.

You might find it useful to order the output by the unexpected_difference column, in order to quickly determine the contributors associated with the largest differences in your data between the test and control sets.

Output for summable ratio metric contribution analysis models

ML.GET_INSIGHTS returns the following output columns for contribution analysis models that use summable ratio metrics, in addition to any input data columns specified in the query_statement of the contribution analysis model:

  • contributors: an ARRAY<STRING> value that contains the dimension values for a given segment. The other output metrics that are returned in the same row apply to the segment described by these dimensions.
  • ratio_test: a NUMERIC value that contains the ratio between the two metrics that you are evaluating, in the test dataset for the given metric. These two metrics are specified in the CONTRIBUTION_METRIC option of the contribution analysis model. The ratio_test value is calculated as sum(numerator_metric_column_name)/sum(denominator_metric_column_name).
  • ratio_control: a NUMERIC value that contains the ratio between the two metrics that you are evaluating, in the control dataset for the given metric. These two metrics are specified in the CONTRIBUTION_METRIC option of the contribution analysis model. The ratio_control value is calculated as sum(numerator_metric_column_name)/sum(denominator_metric_column_name).
  • regional_relative_ratio: a NUMERIC value that contains the ratio between the ratio_test value and the ratio_control value, calculated as ratio_test / ratio_control.
  • ambient_relative_ratio_test: a NUMERIC value that contains the ratio between the ratio_test value for this segment and the ambient ratio_test value, calculated as ratio_test / sum(ratio_test for the population). You can use the ambient_relative_ratio_test value to compare the size of this segment to the size the other segments.

    For example, consider the following table of test data:

    dim1
    dim2
    dim3
    metric_a
    metric_b
    1
    10
    20
    50
    100
    1
    15
    30
    100
    200
    5
    20
    40
    1
    10

    Assume that the CONTRIBUTION_METRIC value is sum(metric_a)/sum(metric_b). Using the data in the preceding table, the metric_a value for the population is 151, while the metric_b value is 310. The ambient_relative_ratio_test value for the first segment in the table is calculated as (50/100)/(101/210) = .50/.48 = 1.03. This ambient_relative_ratio_test value indicates that the size of this segment is fairly close to the size of all of the other segments combined. Alternatively, the ambient_relative_ratio_test value for the last segment in the table is calculated as (1/10)/(150/300) = .10/.50 = 0.2. This ambient_relative_ratio_test value indicates that the size of this segment is smaller than the combined size of the rest of the segments.

  • ambient_relative_ratio_control: a NUMERIC value that contains the ratio between the ratio_control value for this segment and the ambient ratio_control value, calculated as ratio_control / sum(ratio_control for the population). You can use the ambient_relative_ratio_control value to compare the size of this segment to the size the other segments.

  • aumann_shapley_attribution: a NUMERIC value that contains the Aumann-Shapley value for the this segment. The Aumann-Shapley value measures the contribution of the segment ratio relative to the population ratio. You can use the Aumann-Shapley value to determine how much a feature contributes to the prediction value. In the context of contribution analysis, BigQuery ML uses the Aumann-Shapley value to measure the attribution of the segment relative to the population. When calculating this measurement, the service considers the segment ratio changes and the ambient population changes between the test and control datasets.

  • apriori_support: a NUMERIC value that contains the apriori support value for the segment. The apriori support value is calculated using the numerator column specified in the model's CONTRIBUTION_METRIC option. The calculation is expressed as numerator column value for the given segment / sum(numerator column value for the population). If the apriori_support value is less the MIN_APRIORI_SUPPORT option value specified in the model, then the segment is considered too small to be of interest and is excluded by the model.

You might find it useful to order the output by the aumann_shapley_attribution column, in order to quickly determine the contributors associated with the largest differences in your data between the test and control sets.

What's next

Get data insights from a contribution analysis model.