Batch predictions are a way to efficiently send multiple multimodal prompts that are not latency sensitive. Unlike online prediction, where you are limited to one input prompt at a time, you can send a large number of multimodal prompts in a single batch request. Then, your responses asynchronously populate in your BigQuery storage output location.
Batch requests for Gemini models are discounted 50% from standard requests. To learn more, see the Pricing page.
Multimodal models that support batch predictions
The following multimodal models support batch predictions.
gemini-1.5-flash-001
gemini-1.5-pro-001
gemini-1.0-pro-002
gemini-1.0-pro-001
Prepare your inputs
Batch requests for multimodal models accept BigQuery storage sources and Cloud Storage sources.
BigQuery storage input
- The content in the
request
column must be valid JSON. This JSON data represents your input for the model. - The content in the JSON instructions must match the structure of a
GenerateContentRequest
. - Your input table can have columns other than
request
. They are ignored for content generation but included in the output table. The system reserves two column names for output:response
andstatus
. These are used to provide information about the outcome of the batch prediction job. - Batch prediction doesn't support the
fileData
field for Gemini.
Example input (JSON) |
---|
|
Cloud Storage input
- File format: JSON Lines (JSONL)
- Located in
us-central1
- Appropriate read permissions for the service account
- fileData limitations for certain Gemini models.
Example input (JSONL) {"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}} {"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}
Request a batch response
Depending on the number of input items that you submitted, a batch generation task can take some time to complete.
REST
To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
PROJECT_ID
: The name of your Google Cloud project.BP_JOB_NAME
: A name you choose for your job.INPUT_URI
: The input source URI. This is a BigQuery table URI in the formbq://PROJECT_ID.DATASET.TABLE
. Or your Cloud Storage bucket URI.INPUT_SOURCE
: The input source type. Options arebigquerySource
andgcsSource
.INSTANCES_FORMAT
: Input instances format - can be `jsonl` or `bigquery`.OUTPUT_URI
: The URI of the output or target output table, in the formbq://PROJECT_ID.DATASET.TABLE
. If the table doesn't already exist, then it is created for you.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs
Request JSON body:
{ "displayName": "BP_JOB_NAME", "model": "publishers/google/models/gemini-1.0-pro-002", "inputConfig": { "instancesFormat":"INSTANCES_FORMAT", "inputSource":{ INPUT_SOURCE "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"bigquery", "bigqueryDestination":{ "outputUri": "OUTPUT_URI" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}", "displayName": "My first batch prediction", "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-002", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_input" } }, "modelParameters": {}, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_output" } }, "state": "JOB_STATE_PENDING", "createTime": "2023-07-12T20:46:52.148717Z", "updateTime": "2023-07-12T20:46:52.148717Z", "modelVersionId": "1" }
The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job state
is
JOB_STATE_SUCCEEDED
. For example:
curl \ -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Retrieve batch output
When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.
BigQuery output example
request | response | status |
---|---|---|
'{"content":[{...}]}' | { "candidates": [ { "content": { "role": "model", "parts": [ { "text": "In a medium bowl, whisk together the flour, baking soda, baking powder." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.14057204, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.14270912 } ] } ], "usageMetadata": { "promptTokenCount": 8, "candidatesTokenCount": 396, "totalTokenCount": 404 } } |
Cloud Storage output example
PROJECT_ID=[PROJECT ID]
REGION="us-central1"
MODEL_URI="publishers/google/models/gemini-1.0-pro-001@default"
INPUT_URI="[GCS INPUT URI]"
OUTPUT_URI="[OUTPUT URI]"
# Setting variables based on parameters
ENDPOINT="${REGION}-autopush-aiplatform.sandbox.googleapis.com"
API_VERSION=v1
ENV=autopush
BP_JOB_NAME="BP_testing_`date +%Y%m%d_%H%M%S`"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ENDPOINT}/${API_VERSION}/projects/${PROJECT_ID}/locations/${REGION}/batchPredictionJobs \
-d '{
"name": "'${BP_JOB_NAME}'",
"displayName": "'${BP_JOB_NAME}'",
"model": "'${MODEL_URI}'",
"inputConfig": {
"instancesFormat":"jsonl",
"gcsSource":{
"uris" : "'${INPUT_URI}'"
}
},
"outputConfig": {
"predictionsFormat":"jsonl",
"gcsDestination":{
"outputUriPrefix": "'${OUTPUT_URI}'"
}
},
"labels": {"stage": "'${ENV}'"},
}'
What's next
- Learn how to tune a Gemini model in Overview of model tuning for Gemini
- Learn more about the Batch prediction API.