Contribution analysis overview
You can use contribution analysis, also called key driver analysis, to generate
insights about changes to key metrics in your multi-dimensional data. For
example, you can use contribution analysis to see the change in revenue numbers
across two quarters, or to compare two sets of training data to understand
changes in an ML model's performance. You can use a
CREATE MODEL
statement
to create a contribution analysis model in BigQuery.
Contribution analysis is a form of augmented analytics, which is the use of artificial intelligence (AI) to enhance and automate the analysis and understanding of data. Contribution analysis accomplishes one of the key goals of augmented analytics, which is to help users find patterns in their data.
A contribution analysis model detects segments of the data that show changes in a metric by comparing a test set of data to a control set of data. For example, you might use a table snapshot of sales data taken at the end of 2023 as your test data and a table snapshot taken at the end of 2022 as your control data to see how your sales changed over time. A contribution analysis model could show you which segment of data, such as online customers in a particular region, drove the biggest change in sales from one year to the next.
The metric is the numerical value that contribution analysis models use to measure and compare the changes between the test and control data. You can specify a summable metric, summable ratio metric, or summable by category metric with a contribution analysis model.
A segment is a slice of the data identified by a given combination of dimension
values. For example, for a contribution analysis model based on the
store_number
, customer_id
, and day
dimensions, every unique combination
of those dimension values represents a segment. In the following table,
each row represents a different segment:
store_number |
customer_id |
day |
store 1 | ||
store 1 | customer 1 | |
store 1 | customer 1 | Monday |
store 1 | customer 1 | Tuesday |
store 1 | customer 2 | |
store 2 |
To reduce model creation time and model only the largest and most relevant segments, specify an apriori support threshold, which lets you prune small segments from use by the model.
After you have created a contribution analysis model, you can use the
ML.GET_INSIGHTS
function
to retrieve the metric information calculated by the model. The model output
consists of rows of insights, where each insight provides a segment and its
corresponding metrics.