Propensity modeling for gaming applications

Learn how to use BigQuery ML to train, evaluate, and get predictions from several different types of propensity models. Propensity models can help you to determine the likelihood of specific users returning to your app, so you can use that information in marketing decisions.


If you are a mobile game developer, user retention is probably one of your challenges. According to the Mobile Gaming Industry Analysis in 2019, most mobile games only see a 25% retention rate for users after the first day. To retain a larger percentage of users after that, you can take steps to motivate and incentivize those users who are most likely to return. But to target those users, you need to identify the propensity of any specific user returning after the first 24 hours. If you develop other types of online application that prioritize reducing audience churn, this approach should also work for you.

To implement the approach described in this article, use this notebook.

This article discusses a high-level approach to propensity modeling, including:

  • Preparing gaming data so it can be used to train a machine learning (ML) model
  • Choosing what type of propensity model to use
  • Training and evaluating the model
  • Getting predictions from the model
  • Exporting prediction data for use in marketing campaigns


The solution is intended for online application developers, especially mobile game developers. It assumes that you have basic knowledge of the following:

  • Machine learning concepts
  • Standard SQL

Sample dataset

The solution uses the public firebase-public-project.analytics_153293282.events_20181003 dataset. This dataset contains Google Analytics 4 (GA4) data from a real mobile game app called "Flood It!". This dataset contains 5.7 million events from over 15 thousand users.

GA4 uses an event-based measurement model. Each row in this dataset is a unique event, which contains nested fields for event parameters. To learn more about the structure of the data, see the Schema for BigQuery Export.

Exporting Google Analytics data to BigQuery

If instead of the sample data, you want to use your own data from a GA4 property, you can follow the instructions in (GA4) Set up BigQuery Export to export your data.

Processing the data

You must process the event data to get it into the right shape and format to use as training data. Processing the data accomplishes the following goals:

  • Filters out users who are unlikely to return to the app
  • Creates features for user demographic data
  • Creates features for user behavioral data

Having a combination of both demographic data and behavioral data helps to create a more predictive model.

Once processed, each row the training data represents the data for a unique user, identified by the user_pseudo_id column.

Labeling users

To process the data, start by labeling records to identify users who seem unlikely to return to the app. In this solution, you label records to categorize then based on the their interaction with the application.

First, you filter out users who spent less than 10 mins using the app during their first visit. You label these users as bounced.

You then label the remaining users as either churned if they have no event data for the user after 24 hours of first engaging with the application, or as returned if they have at least one event record after 24 hours of first engaging with the app. The model uses these labels during training to learn how to identify the likelihood that a user will fall into one category or the other. You can then use the trained model to predict that likelihood for specific users.

If you have a different use case, you can use different criteria to identify bounced or churned customers. If you want to predict something other than audience churn, you can consider, for example:

  • Whether a user is likely to spend money on in-game currency
  • The likelihood of the user completing n game levels
  • The likelihood of the user spending n amount of time in-game

Processing demographic features

The next step in processing the data is to add features for user demographics. This solution uses the following fields as demographic features:

  • device.operating_system
  • device.language

However, you can choose different fields if those work better for your use case.

A user might have multiple unique values in these fields. For example, if a user used the app from two different devices. To simplify, this solution uses the values from the first user engagement event.

Processing behavioral features

The final step in processing the data is to add features for user behavior.

To extract user behavior from the data, the solution analyzes each user's activities in the first 24 hours of user engagement. In addition to the events automatically collected by Analytics, there are also the recommended events for games that you can explore to understand user behavior. To predict user churn in this case, the solution counts the number of times the following events are collected for each user within 24 hours of first user engagement:

  • user_engagement
  • level_start_quickplay
  • level_end_quickplay
  • level_complete_quickplay
  • level_reset_quickplay
  • post_score
  • spend_virtual_currency
  • ad_reward
  • challenge_a_friend
  • completed_5_levels
  • use_extra_steps

You can use different sets of events if your use case is different from the one described here. To view list of available events, use the following query:

    COUNT(event_name) as event_count
   event_count DESC

Choosing a model

Once the training data is prepared, you are ready to create an ML model. You can choose from a number of classification algorithms for the model. The following table lists the model types and their pros and cons:

Model model_type Advantages Disadvantages
Logistic regression LOGISTIC_REG Faster to train than other model types. Might not have the highest performance.
  • Higher performance than a LOGISTIC_REG model.
  • Allows inspection of feature importance.
Slower to train than a LOGISTIC_REG model.
Deep neural networks DNN_CLASSIFIER Higher performance than a LOGISTIC_REG model. Slower to train than a LOGISTIC_REG model.
AutoML Tables AUTOML_CLASSIFIER Higher performance than a LOGISTIC_REG model.
  • Might take longer to train than other model types.
  • Limited model explainability.

This solution defaults to using a logistic regression model because it is the fastest to train, but in the notebook, you can can choose to use one of the other model types if you prefer.

Training the model

Train the classification model by using BigQuery ML. The trained model outputs a propensity score that indicates the probability of a user churning. A 100% probability of churning is indicated by churned=1 and a 100% probability of returning is indicated by churned=0, with most results falling between these boundaries.

When you use the CREATE MODEL statement, BigQuery ML automatically splits the training data into a training set and a test set. This lets you evaluate the model after training is complete so you can see how accurate it is.

The query below shows the CREATE OR REPLACE MODEL statement for training the model:

CREATE OR REPLACE MODEL bqmlga4.churn_logreg

  EXTRACT(MONTH from user_first_engagement) as month,
  EXTRACT(DAYOFYEAR from user_first_engagement) as julianday,
  EXTRACT(DAYOFWEEK from user_first_engagement) as dayofweek,
  EXTRACT(HOUR from user_first_engagement) as hour,
  * EXCEPT(user_first_engagement, user_pseudo_id)

) AS


This query extracts month, julianday, and dayofweek values from datetime and timestamp columns, as one example of additional feature preprocessing that you can do before training. The TRANSFORM() function in your CREATE MODEL query allows the model to retain the extracted values. That way, when you are using this model to make predictions later on, these values won't have to be extracted again.

View the notebook to see examples of how to train XGBoost, deep neural network, and AutoML Tables models.

Evaluating the model

Once the model finished training, evaluate the model to see how it performed.

This solution uses the ML.EVALUATE statement to generate precision, recall, accuracy and f1_score metrics for the model. Training multiple models and then comparing their evaluation metrics can help you decide which model works best with your data.

This solution also uses a confusion matrix to inspect how well the model predicted the labels, compared to the actual labels. The confusion matrix is created using the default threshold of 0.5, which you may want to adjust to optimize for recall, precision, or a balance of the two. You can use the optional THRESHOLD parameter to modify the default threshold.

Getting predictions

Once the model is trained and evaluated, you can use it to get predictions.

The solution uses the ML.PREDICT statement to request predictions from the model.

For propensity modeling, the prediction output is the probability of a behavior occurring. The closer the predicted probability is to 1, the more likely the user is to return. The closer the predicted probability is to 0, the more likely the user is to churn.

Exporting predictions

Once you have predictions from the model, you can use this data for marketing activation. Common ways to do this are to export the data for use in Analytics or Firebase.

Using predictions in Google Analytics

You can use prediction data for marketing campaigns in Analytics. You can import the model predictions back into Analytics as a user attribute by using the Data Import feature for Google Analytics 4. Based on the prediction values you can Create and edit audiences and also do Audience targeting For example, you can define an audience based on users with prediction probabilities between 0.4 and 0.7, which are those that might not return on their own but could be incentivized to do so.

Using predictions in Firebase

For Firebase applications, you can use the Import segments feature to import prediction data. With this data, you can tailor the user experience by targeting your identified users through Firebase services such as Remote Config, Cloud Messaging, and In-App Messaging. This allows you to do things like send notifications to users, configure the app for users, and follow user journeys across devices.

Continuous model evaluation

As more data is generated from your users, you might want to regularly evaluate your model on fresh data and re-train the model if you notice that the model quality is declining.

This process of ensuring a production machine learning model is still performing well on new data is called continuous evaluation, and it is an essential part of any ML workflow. Performing continuous evaluation can help you catch model drift, a phenomenon that occurs when the data used to train your model no longer reflects the data you are using to request predictions.

To learn more about how to do continuous model evaluation, see Continuous model evaluation with BigQuery ML, Stored Procedures, and Cloud Scheduler.

What's next