This page shows you how to get batch predictions from your custom trained models using the Google Cloud console or the Vertex AI API.
To make a batch prediction request, you specify an input source and an output location, either Cloud Storage or BigQuery, where Vertex AI stores the batch prediction results.
Limitations and requirements
Consider the following limitations and requirements when getting batch predictions:
- To minimize processing time, your input and output locations must be in the same
region or multi-region. For example, if your input is in
us-central1
, your output can be inus-central1
orUS
, but noteurope-west4
. To learn more, see Cloud Storage locations and BigQuery locations. - Your input and output must also be in the same region or multi-region as your model.
- Model Garden models are not supported.
- BigQuery ML models aren't custom trained models. However, the
information in this page can be used to get batch predictions from a
BigQuery ML model under the following conditions:
- The BigQuery ML model must be registered with Vertex AI Model Registry.
- To use a BigQuery table as input, you must set
InstanceConfig.instanceType
to"object"
using the Vertex AI API.
Input data requirements
The input for batch requests specifies the items to send to your model for prediction. We support the following input formats:
JSON Lines
Use a JSON Lines file to specify a list of input instances to make predictions about. Store the file in a Cloud Storage bucket.
Example 1
The following example shows a JSON Lines file where each line contains an array:
[1, 2, 3, 4]
[5, 6, 7, 8]
Here is what is sent to the prediction container in the HTTP request body:
All other containers
{"instances": [ [1, 2, 3, 4], [5, 6, 7, 8] ]}
PyTorch containers
{"instances": [ { "data": [1, 2, 3, 4] }, { "data": [5, 6, 7, 8] } ]}
Example 2
The following example shows a JSON Lines file where each line contains an object.
{ "values": [1, 2, 3, 4], "key": 1 }
{ "values": [5, 6, 7, 8], "key": 2 }
Here is what is sent to the prediction container in the HTTP request body. Note that the same request body is sent to all containers.
{"instances": [
{ "values": [1, 2, 3, 4], "key": 1 },
{ "values": [5, 6, 7, 8], "key": 2 }
]}
Example 3
For PyTorch prebuilt containers, make sure that you wrap each instance in a
data
field as required by TorchServe's default handler; Vertex AI
doesn't wrap your instances for you. For example:
{ "data": { "values": [1, 2, 3, 4], "key": 1 } }
{ "data": { "values": [5, 6, 7, 8], "key": 2 } }
Here is what is sent to the prediction container in the HTTP request body:
{"instances": [
{ "data": { "values": [1, 2, 3, 4], "key": 1 } },
{ "data": { "values": [5, 6, 7, 8], "key": 2 } }
]}
TFRecord
Save input instances in the TFRecord format. You can optionally compress the TFRecord files with Gzip. Store the TFRecord files in a Cloud Storage bucket.
Vertex AI reads each instance in your TFRecord files as binary
and then base64-encodes the instance as JSON object with a single key named
b64
.
Here is what is sent to the prediction container in the HTTP request body:
All other containers
{"instances": [ { "b64": "b64EncodedASCIIString" }, { "b64": "b64EncodedASCIIString" } ]}
PyTorch containers
{"instances": [ { "data": {"b64": "b64EncodedASCIIString" } }, { "data": {"b64": "b64EncodedASCIIString" } } ]}
Make sure your prediction container knows how to decode the instance.
CSV
Specify one input instance per row in a CSV file. The first row must be a header row. You must enclose all strings in double quotation marks ("). Vertex AI doesn't accept cell values that contain newlines. Non-quoted values are read as floating point numbers.
The following example shows a CSV file with two input instances:
"input1","input2","input3"
0.1,1.2,"cat1"
4.0,5.0,"cat2"
Here is what is sent to the prediction container in the HTTP request body:
All other containers
{"instances": [ [0.1,1.2,"cat1"], [4.0,5.0,"cat2"] ]}
PyTorch containers
{"instances": [ { "data": [0.1,1.2,"cat1"] }, { "data": [4.0,5.0,"cat2"] } ]}
File list
Create a text file where each row is the Cloud Storage URI to a
file. Vertex AI reads the contents of each file as binary and then
base64-encodes the instance as JSON object with a single key named b64
.
If you plan to use the Google Cloud console to get batch predictions, paste your file list directly into the Google Cloud console. Otherwise, save the list in a Cloud Storage bucket.
The following example shows a file list with two input instances:
gs://path/to/image/image1.jpg
gs://path/to/image/image2.jpg
Here is what is sent to the prediction container in the HTTP request body:
All other containers
{ "instances": [ { "b64": "b64EncodedASCIIString" }, { "b64": "b64EncodedASCIIString" } ]}
PyTorch containers
{ "instances": [ { "data": { "b64": "b64EncodedASCIIString" } }, { "data": { "b64": "b64EncodedASCIIString" } } ]}
Make sure your prediction container knows how to decode the instance.
BigQuery
Specify a BigQuery table as projectId.datasetId.tableId
.
Vertex AI transforms each row from the table to a JSON
instance.
For example, if your table contains the following:
Column 1 | Column 2 | Column 3 |
---|---|---|
1.0 | 3.0 | "Cat1" |
2.0 | 4.0 | "Cat2" |
Here is what is sent to the prediction container in the HTTP request body:
All other containers
{"instances": [ [1.0,3.0,"cat1"], [2.0,4.0,"cat2"] ]}
PyTorch containers
{"instances": [ { "data": [1.0,3.0,"cat1"] }, { "data": [2.0,4.0,"cat2"] } ]}
Here is how BigQuery data types are converted to JSON:
BigQuery Type | JSON Type | Example value |
---|---|---|
String | String | "abc" |
Integer | Integer | 1 |
Float | Float | 1.2 |
Numeric | Float | 4925.000000000 |
Boolean | Boolean | true |
TimeStamp | String | "2019-01-01 23:59:59.999999+00:00" |
Date | String | "2018-12-31" |
Time | String | "23:59:59.999999" |
DateTime | String | "2019-01-01T00:00:00" |
Record | Object | { "A": 1,"B": 2} |
Repeated Type | Array[Type] | [1, 2] |
Nested Record | Object | {"A": {"a": 0}, "B": 1} |
Partition data
Batch prediction uses MapReduce to shard the input to each replica. To make use of the MapReduce features, the input should be partitionable.
Vertex AI automatically partitions BigQuery, file list, and JSON lines input.
Vertex AI does not automatically partition CSV files because they are not naturally partition-friendly. Rows in CSV files are not self-descriptive, typed, and may contain newlines. We recommend against using CSV input for throughput-sensitive applications.
For TFRecord input, make sure you manually partition the data by splitting the instances into smaller files and passing the files to the job with a wildcard (for example, gs://my-bucket/*.tfrecord
). The number of files should be at least the number of replicas specified.
Filter and transform input data
You can filter and transform your batch input by specifying
instanceConfig
in your
BatchPredictionJob
request.
Filtering lets you either exclude certain fields that are in the input data from your prediction request, or include only a subset of fields from the input data in your prediction request, without having to do any custom pre-processing or post-processing in the prediction container. This is useful when your input data file has extra columns that the model doesn't need, such as keys or additional data.
Transforming lets you send the instances to your prediction container in
either a JSON array
or object
format. See
instanceType
for more information.
For example, if your input table contains the following:
customerId | col1 | col2 |
---|---|---|
1001 | 1 | 2 |
1002 | 5 | 6 |
and you specify the following instanceConfig
:
{
"name": "batchJob1",
...
"instanceConfig": {
"excludedFields":["customerId"]
"instanceType":"object"
}
}
Next, the instances in your prediction request are sent as JSON objects, and the
customerId
column is excluded:
{"col1":1,"col2":2}
{"col1":5,"col2":6}
Note that specifying the following instanceConfig
would yield the same result:
{
"name": "batchJob1",
...
"instanceConfig": {
"includedFields": ["col1","col2"]
"instanceType":"object"
}
}
For a demonstration on how to use feature filters, see the Custom model batch prediction with feature filtering notebook.
Request a batch prediction
For batch prediction requests, you can use the Google Cloud console or the Vertex AI API. Depending on the number of input items that you've submitted, a batch prediction task can take some time to complete.
When you request a batch prediction, the prediction container runs as the user-provided custom service account. The read-write operations, such as reading the prediction instances from the data source or writing the prediction results, are done using the Vertex AI service agent, which by default has access to BigQuery and Cloud Storage.
Google Cloud console
Use the Google Cloud console to request a batch prediction.
- In the Google Cloud console, in the Vertex AI section, go to the Batch predictions page.
Go to the Batch predictions page
Click Create to open the New batch prediction window.
For Define your batch prediction, complete the following steps:
Enter a name for the batch prediction.
For Model name, select the name of the model to use for this batch prediction.
For Select source, select the source that applies to your input data:
- If you have formatted your input as JSON Lines, CSV, or TFRecord, select File on Cloud Storage (JSON Lines, CSV, TFRecord, TFRecord Gzip). Then specify your input file in the Source path field.
- If you are using a file list as input, select Files on Cloud Storage (other) and paste your file list into the following field.
- For BigQuery input, select BigQuery path. If you select BigQuery as input, you must also select BigQuery as output and Google-managed encryption key. Customer-managed encryption key (CMEK) isn't supported with BigQuery as input/output.
In the Destination path field, specify the Cloud Storage directory where you want Vertex AI to store batch prediction output.
Optionally, you can check Enable feature attributions for this model to get feature attributions as part of the batch prediction response. Then click Edit to configure explanation settings. (Editing the explanation settings is optional if you previously configured explanation settings for the model, and required otherwise.)
Specify compute options for the batch prediction job: Number of compute nodes, Machine type, and (optionally) Accelerator type and Accelerator count.
Optional: Model Monitoring analysis for batch predictions is available in Preview. See the Prerequisites for adding skew detection configuration to your batch prediction job.
Click to toggle on Enable model monitoring for this batch prediction.
Select a Training data source. Enter the data path or location for the training data source that you selected.
Optional: Under Alert thresholds, specify thresholds at which to trigger alerts.
For Notification emails, enter one or more comma-separated email addresses to receive alerts when a model exceeds an alerting threshold.
Optional: For Notification channels, add Cloud Monitoring channels to receive alerts when a model exceeds an alerting threshold. You can select existing Cloud Monitoring channels or create a new one by clicking Manage notification channels. The Google Cloud console supports PagerDuty, Slack, and Pub/Sub notification channels.
Click Create.
API
Use the Vertex AI API to send batch prediction requests. Select a tab depending on which tool you are using to get batch predictions.
REST
Before using any of the request data, make the following replacements:
LOCATION_ID: Region where Model is stored and batch prediction job is executed. For example,
us-central1
.PROJECT_ID: Your project ID.
BATCH_JOB_NAME: Display name for the batch prediction job.
MODEL_ID: The ID for the model to use for making predictions.
INPUT_FORMAT: The format of your input data:
jsonl
,csv
,tf-record
,tf-record-gzip
, orfile-list
.INPUT_URI: Cloud Storage URI of your input data. May contain wildcards.
OUTPUT_DIRECTORY: Cloud Storage URI of a directory where you want Vertex AI to save output.
MACHINE_TYPE: The machine resources to be used for this batch prediction job.
You can optionally configure the
machineSpec
field to use accelerators, but the following example does not demonstrate this.BATCH_SIZE: The number of instances to send in each prediction request; the default is 64. Increasing the batch size can lead to higher throughput, but it can also cause request timeouts.
STARTING_REPLICA_COUNT: The number of nodes for this batch prediction job.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs
Request JSON body:
{ "displayName": "BATCH_JOB_NAME", "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID", "inputConfig": { "instancesFormat": "INPUT_FORMAT", "gcsSource": { "uris": ["INPUT_URI"], }, }, "outputConfig": { "predictionsFormat": "jsonl", "gcsDestination": { "outputUriPrefix": "OUTPUT_DIRECTORY", }, }, "dedicatedResources" : { "machineSpec" : { "machineType": MACHINE_TYPE }, "startingReplicaCount": STARTING_REPLICA_COUNT }, "manualBatchTuningParameters": { "batch_size": BATCH_SIZE, } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/batchPredictionJobs/BATCH_JOB_ID", "displayName": "BATCH_JOB_NAME 202005291958", "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID", "inputConfig": { "instancesFormat": "jsonl", "gcsSource": { "uris": [ "INPUT_URI" ] } }, "outputConfig": { "predictionsFormat": "jsonl", "gcsDestination": { "outputUriPrefix": "OUTPUT_DIRECTORY" } }, "state": "JOB_STATE_PENDING", "createTime": "2020-05-30T02:58:44.341643Z", "updateTime": "2020-05-30T02:58:44.341643Z", }
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
In the following sample, replace PREDICTIONS_FORMAT with jsonl
.
To learn how to replace the other placeholders, see the REST & CMD LINE
tab of this section.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
BigQuery
The preceding REST example uses Cloud Storage for the source and destination. To use BigQuery instead, make the following changes:
Change the
inputConfig
field to the following:"inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "bq://SOURCE_PROJECT_ID.SOURCE_DATASET_NAME.SOURCE_TABLE_NAME" } }
Change the
outputConfig
field to the following:"outputConfig": { "predictionsFormat":"bigquery", "bigqueryDestination":{ "outputUri": "bq://DESTINATION_PROJECT_ID.DESTINATION_DATASET_NAME.DESTINATION_TABLE_NAME" } }
Replace the following:
SOURCE_PROJECT_ID
: ID of the source Google Cloud projectSOURCE_DATASET_NAME
: name of the source BigQuery datasetSOURCE_TABLE_NAME
: name of the BigQuery source tableDESTINATION_PROJECT_ID
: ID of the destination Google Cloud projectDESTINATION_DATASET_NAME
: name of the destination BigQuery datasetDESTINATION_TABLE_NAME
: name of the BigQuery destination table
Feature importance
If you want feature importance values returned for your predictions, set the
generateExplanation
property to true
. Note that forecasting models
don't support feature importance, so you can't include it in your batch
prediction requests.
Feature importance, sometimes called feature attributions, is part of Vertex Explainable AI.
You can only set generateExplanation
to true
if you have configured
your Model
for explanations or if
you specify the BatchPredictionJob
's explanationSpec
field.
Choose machine type and replica count
Scaling horizontally by increasing the number of replicas improves throughput more linearly and predictably than by using larger machine types.
In general, we recommend that you specify the smallest machine type possible for your job and increase the number of replicas.
For cost-effectiveness, we recommend that you choose the replica count such that your batch prediction job runs for at least 10 minutes. This is because you are billed per replica node hour, which includes the approximately 5 minutes it takes for each replica to start up. It isn't cost-effective to process for only a few seconds and then shut down.
As general guidance, for thousands of instances, we recommend a
starting_replica_count
in the tens. For millions of instances, we
recommend a starting_replica_count
in the hundreds. You can also use the
following formula to estimate the number of replicas:
N / (T * (60 / Tb))
Where:
- N: The number of batches in the job. For example, 1 million instances / 100 batch size = 10,000 batches.
- T: The expected time for the batch prediction job. For example, 10 minutes.
- Tb: The time in seconds it takes for a replica to process a single batch. For example, 1 second per batch on a 2-core machine type.
In our example, 10,000 batches / (10 minutes * (60 / 1s)) rounds up to 17 replicas.
Unlike online prediction, batch prediction jobs don't autoscale. Because all of the input data is known up front, the system partitions the data to each replica when the job starts. The system uses the starting_replica_count
parameter. The max_replica_count
parameter is ignored.
These recommendations are all approximate guidelines. They don't necessarily give optimal throughput for every model. They don't provide exact predictions of processing time and cost. And they don't necessarily capture the best cost-versus-throughput tradeoffs for each scenario. Use them as a reasonable starting point and adjust them as necessary. To measure characteristics such as throughput for your model, run the Finding ideal machine type notebook.
For GPU- or TPU-accelerated machines
Follow the preceding guidelines (which also apply to CPU-only models), with the following additional considerations:
- You might need more CPUs and GPUs (for example, for data preprocessing).
- GPU machine types take more time to startup (10 minutes), so you may want to target longer times (for example, at least 20 minutes instead of 10 minutes) for the batch prediction job so that a reasonable proportion of the time and cost is spent on generating predictions.
Retrieve batch prediction results
When a batch prediction task is complete, the output of the prediction is stored in the Cloud Storage bucket or BigQuery location that you specified in your request.
Example batch prediction result
The output folder contains a set of JSON Lines files.
The files are named
{gcs_path}/prediction.results-{file_number}-of-{number_of_files_generated}
.
The number of files is nondeterministic, due to the distributed nature of batch
prediction.
Each line in the file corresponds to an instance from the input and has the following key-value pairs:
prediction
: contains the value returned by prediction container.instance
: For a file list, contains the Cloud Storage URI. For all other input formats, contains the value that was sent to the prediction container in the HTTP request body.
Example 1
If the HTTP request contains:
{
"instances": [
[1, 2, 3, 4],
[5, 6, 7, 8]
]}
And the prediction container returns:
{
"predictions": [
[0.1,0.9],
[0.7,0.3]
],
}
Then the JSON Lines output file is:
{ "instance": [1, 2, 3, 4], "prediction": [0.1,0.9]}
{ "instance": [5, 6, 7, 8], "prediction": [0.7,0.3]}
Example 2
If the HTTP request contains:
{
"instances": [
{"values": [1, 2, 3, 4], "key": 1},
{"values": [5, 6, 7, 8], "key": 2}
]}
And the prediction container returns:
{
"predictions": [
{"result":1},
{"result":0}
],
}
Then the JSON Lines output file is:
{ "instance": {"values": [1, 2, 3, 4], "key": 1}, "prediction": {"result":1}}
{ "instance": {"values": [5, 6, 7, 8], "key": 2}, "prediction": {"result":0}}
Use Explainable AI
We don't recommend running feature-based explanations on a large amount of data. This is because each input can potentially fan out to thousands of requests based on the set of possible feature values which may result in massively increased processing time and cost. In general, a small dataset is enough to understand feature importance.
Batch prediction does not support example-based explanations.
Notebooks
What's next
- Learn about Compute resources for prediction.