Batch prediction is a valuable technique for applying machine learning models to large datasets efficiently. Instead of processing individual data points, you can submit a batch of data to Gemini for prediction, saving time and computational resources. Unlike online prediction, where you are limited to one input prompt at a time, you can send a large number of multimodal prompts in a single batch request. Then, your responses asynchronously populate in your BigQuery or Cloud Storage storage output location.
Batch requests for Gemini models are discounted 50% from standard requests. To learn more, see the Pricing page.
Batch prediction use case
Consider an online bookstore with thousands of books in its database. Instead of generating descriptions individually for each book, which would be time-consuming, this store can use Gemini batch prediction to process all book information at once. This approach dramatically improves efficiency by reducing the overall processing time and minimizing the computational resources required.
Batch prediction can also improve consistency with automation. By processing all descriptions simultaneously, the model maintains a uniform tone and style across book descriptions, reinforcing brand identity. This bookstore can also integrate batch prediction into their workflow to automatically generate descriptions for new book entries, eliminating manual effort and ensuring their website remains up-to-date with minimal human intervention.
Gemini models that support batch predictions
The following Gemini models support batch predictions.
gemini-1.5-flash-002
gemini-1.5-flash-001
gemini-1.5-pro-002
gemini-1.5-pro-001
gemini-1.0-pro-vision-001
gemini-1.0-pro-002
gemini-1.0-pro-001
Batch requests for Gemini models accept BigQuery storage sources and Cloud Storage sources. You can independently choose to output predictions to either a BigQuery table or a JSONL file in a Cloud Storage bucket.
Batch prediction for Cloud Storage
Prepare your inputs
Cloud Storage input
- File format: JSON Lines (JSONL)
- Located in
us-central1
Must have appropriate Cloud Storage permissions for the service account. To grant the service account read and write permission on a Cloud Storage bucket, use the
gcloud iam service-accounts add-iam-policy-binding
command as follows:gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com" \ --role="storage.objectUser"
Replace the following values:
- PROJECT_ID: The project that your service account was created in.
- SERVICE_ACCOUNT_ID: The ID for the service account.
The following Gemini models support
fileData
:gemini-1.5-flash-002
gemini-1.5-flash-001
gemini-1.5-pro-002
gemini-1.5-pro-001
Example input (JSONL) |
---|
|
Request a batch prediction job
Specify your Cloud Storage input table, model, and output location.
REST
To create a batch prediction job, use the
projects.locations.batchPredictionJobs.create
method.
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports Gemini models.
- PROJECT_ID: Your project ID.
- INPUT_URI: The
Cloud Storage location of your JSONL batch prediction input such as
gs://bucketname/path/to/file.jsonl
. - OUTPUT_FORMAT: To output to
a BigQuery table, specify
bigquery
. To output to a Cloud Storage bucket, specifyjsonl
. - DESTINATION: For
BigQuery, specify
bigqueryDestination
. For Cloud Storage, specifygcsDestination
. - OUTPUT_URI_FIELD_NAME:
For BigQuery, specify
outputUri
. For Cloud Storage, specifyoutputUriPrefix
. - OUTPUT_URI: For
BigQuery, specify the table location such as
bq://myproject.mydataset.output_result
. The region of the output BigQuery dataset must be the same as the Vertex AI batch prediction job. For Cloud Storage, specify the bucket and directory location such asgs://mybucket/path/to/output
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
Request JSON body:
{ "displayName": "my-cloud-storage-batch-prediction-job", "model": "publishers/google/models/gemini-1.5-flash-002", "inputConfig": { "instancesFormat": "jsonl", "gcsSource": { "uris" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat": "OUTPUT_FORMAT", "DESTINATION": { "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job state
is
JOB_STATE_SUCCEEDED
. For example:
curl \ -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Batch prediction output
When a batch prediction task completes, the output is stored
in the Cloud Storage bucket or BigQuery table that you specified in your request.
For succeeded rows, model responses are stored in the response
column. Otherwise,
error details are stored in the status
column for further inspection.
During long-running jobs, completed predictions are continuously exported to the specified output destination. This begins after 90 minutes. If the batch prediction job is canceled or fails, all completed predictions are exported.
Cloud Storage output example
{
"status": "",
"processed_time": "2024-11-01T18:13:16.826+00:00",
"request": {
"contents": [
{
"parts": [
{
"fileData": null,
"text": "What is the relation between the following video and image samples?"
},
{
"fileData": {
"fileUri": "gs://cloud-samples-data/generative-ai/video/animals.mp4",
"mimeType": "video/mp4"
},
"text": null
},
{
"fileData": {
"fileUri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg",
"mimeType": "image/jpeg"
},
"text": null
}
],
"role": "user"
}
]
},
"response": {
"candidates": [
{
"avgLogprobs": -0.5782725546095107,
"content": {
"parts": [
{
"text": "This video shows a Google Photos marketing campaign where animals at the Los Angeles Zoo take self-portraits using a modified Google phone housed in a protective case. The image is unrelated."
}
],
"role": "model"
},
"finishReason": "STOP"
}
],
"modelVersion": "gemini-1.5-flash-002@default",
"usageMetadata": {
"candidatesTokenCount": 36,
"promptTokenCount": 29180,
"totalTokenCount": 29216
}
}
}
Batch prediction for BigQuery
Specify your BigQuery input table, model, and output location. The batch prediction job and your table must be in the same region.
Prepare your inputs
BigQuery storage input
Your service account must have have appropriate BigQuery permissions. To grant the service account the BigQuery User role, use the
gcloud iam service-accounts add-iam-policy-binding
command as follows:gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/bigquery.user"
Replace the following values:
- PROJECT_ID: The project that your service account was created in.
- SERVICE_ACCOUNT_ID: The ID for the service account.
A
request
column is required, and must be valid JSON. This JSON data represents your input for the model.The content in the
request
column must match the structure of aGenerateContentRequest
.Your input table can have column data types other than
request
. These columns can have BigQuery data types except for the following: array, struct, range, datetime, and geography. These columns are ignored for content generation but included in the output table. The system reserves two column names for output:response
andstatus
. These are used to provide information about the outcome of the batch prediction job.The following Gemini models support
fileData
:gemini-1.5-flash-002
gemini-1.5-flash-001
gemini-1.5-pro-002
gemini-1.5-pro-001
Example input (JSON) |
---|
|
Request a batch prediction job
REST
To create a batch prediction job, use the
projects.locations.batchPredictionJobs.create
method.
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports Gemini models.
- PROJECT_ID: Your project ID.
- INPUT_URI: The
BigQuery table where your batch prediction input is located
such as
bq://myproject.mydataset.input_table
. Multi-region datasets are not supported. - OUTPUT_FORMAT: To output to
a BigQuery table, specify
bigquery
. To output to a Cloud Storage bucket, specifyjsonl
. - DESTINATION: For
BigQuery, specify
bigqueryDestination
. For Cloud Storage, specifygcsDestination
. - OUTPUT_URI_FIELD_NAME:
For BigQuery, specify
outputUri
. For Cloud Storage, specifyoutputUriPrefix
. - OUTPUT_URI: For
BigQuery, specify the table location such as
bq://myproject.mydataset.output_result
. The region of the output BigQuery dataset must be the same as the Vertex AI batch prediction job. For Cloud Storage, specify the bucket and directory location such asgs://mybucket/path/to/output
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
Request JSON body:
{ "displayName": "my-bigquery-batch-prediction-job", "model": "publishers/google/models/gemini-1.5-flash-002", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat": "OUTPUT_FORMAT", "DESTINATION": { "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job state
is
JOB_STATE_SUCCEEDED
. For example:
curl \ -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Retrieve batch output
When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.
For succeeded rows, model responses are stored in the response
column. Otherwise,
error details are stored in the status
column for further inspection.
BigQuery output example
request | response | status |
---|---|---|
{"content":[{...}]} |
{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": "In a medium bowl, whisk together the flour, baking soda, baking powder." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.14057204, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.14270912 } ] } ], "usageMetadata": { "promptTokenCount": 8, "candidatesTokenCount": 396, "totalTokenCount": 404 } } |
What's next
- Learn how to tune a Gemini model in Overview of model tuning for Gemini
- Learn more about the Batch prediction API.