Prepare to generate prediction outputs

This page describes how to prepare the datasets needed to generate prediction outputs.

Before you begin

Before you begin, you need the following:

Create a dataset for prediction

You can create predictions using an existing dataset (for example, the one you were using for backtesting). However, in a production environment, we recommend that you create a new dataset for each prediction run:

  1. As a customer, you're responsible for all tracking of lineage from dataset to model. To ensure data remains unchanged, we recommend that you create a snapshot of your BigQuery tables (instead of just making copies) and use BigQuery snapshots in your AML AI dataset. In the case that you do not snapshot data, AML AI operations read the BigQuery tables each time an operation uses the dataset, so changes to the dataset could impact tuning, training, backtesting, and predictions.
  2. Follow the guidance under Prepare Data for AML AI to create a separate AML AI dataset for prediction using the tables you snapshotted in Step 1. To create the BigQuery datasets and tables, you can use the commands in Prepare BigQuery datasets and tables.

Prepare the output destinations

AML AI generates prediction outputs (risk scores and explainability) when you create a prediction results resource.

Prediction outputs are sent to BigQuery. Before creating prediction results, you must create a BigQuery dataset for these outputs. Any BigQuery dataset can be used for prediction outputs, as long as the correct permissions are granted and the dataset is in the same project where the API is enabled.

Generate risk scores and explainability

Now that you have the dataset for prediction, a trained model resource, and a BigQuery dataset for output, you can create prediction results. To do this, see Create and manage prediction results.