You can use batch predictions to send a large number of non-latency-sensitive prompts to an Anthropic Claude model in a single request. Unlike online predictions, which process one prompt per request, batch predictions are ideal for large-scale, offline jobs. This page shows you how to get batch predictions from Anthropic Claude models and covers the following topics: Vertex AI supports batch predictions for the following Anthropic Claude models:
Supported Anthropic Claude models
Quotas: By default, you can make up to four concurrent batch requests in a single project.
Prepare your input data
You can provide batch input data using a BigQuery table or a JSON Lines (JSONL) file stored in Cloud Storage.
Input Source | Description | Use Case |
---|---|---|
BigQuery | Input data is stored in a BigQuery table with specific columns for the request ID and prompt. | Ideal when your source data already resides in BigQuery or when you need to perform complex data preparation and transformation using SQL. |
Cloud Storage (JSONL) | Input data is a JSONL file where each line is a separate JSON request object. | Best for when your data is generated as individual records, exported from systems that produce JSON, or when you prefer a simple, file-based workflow. |
The input for both sources must follow the Anthropic Claude API Schema JSON format, as shown in the following example:
{
"custom_id": "request-1",
"request": {
"messages": [{"role": "user", "content": "Hello!"}],
"anthropic_version": "vertex-2023-10-16",
"max_tokens": 50
}
}
BigQuery
Your BigQuery input table must adhere to the following schema:
Column name | Description |
---|---|
custom_id |
An ID for each request to match the input with the output. |
request |
The request body, which is your input prompt and must follow the Anthropic Claude API Schema. |
- Your input table can have other columns, which are ignored by the batch job.
- Batch prediction jobs reserve two column names for the output:
response(JSON)
andstatus
. Don't use these column names in your input table.
Cloud Storage
For Cloud Storage, the input must be a JSONL file located in a Cloud Storage bucket. Each line of the file is a separate JSON request object.
Request a batch prediction
After you prepare your input data, you can request a batch prediction. You can use input from either BigQuery or Cloud Storage and write the output to either a BigQuery table or a JSONL file in a Cloud Storage bucket.
From BigQuery
Specify your BigQuery input table, model, and output location. The batch prediction job and your table must be in the same region.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
REST
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports the selected Anthropic Claude model (see Claude Regions).
- PROJECT_ID: Your project ID.
- MODEL: The name of the model.
- INPUT_URI: The
BigQuery table where your batch prediction input is located
such as
bq://myproject.mydataset.input_table
. - OUTPUT_FORMAT: To output to
a BigQuery table, specify
bigquery
. To output to a Cloud Storage bucket, specifyjsonl
. - DESTINATION: For
BigQuery, specify
bigqueryDestination
. For Cloud Storage, specifygcsDestination
. - OUTPUT_URI_FIELD_NAME:
For BigQuery, specify
outputUri
. For Cloud Storage, specifyoutputUriPrefix
. - OUTPUT_URI: For
BigQuery, specify the table location such as
bq://myproject.mydataset.output_result
. For Cloud Storage, specify the bucket and folder location such asgs://mybucket/path/to/outputfile
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
Request JSON body:
'{ "displayName": "JOB_NAME", "model": "publishers/anthropic/models/MODEL", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } } }'
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
From Cloud Storage
Specify your JSONL file's Cloud Storage location, model, and output location.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
REST
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports the selected Anthropic Claude model (see Claude Regions).
- PROJECT_ID: .
- MODEL: The name of the model.
- INPUT_URIS: A comma-separated list of the
Cloud Storage locations of your JSONL batch prediction input such as
gs://bucketname/path/to/jsonl
. - OUTPUT_FORMAT: To output to
a BigQuery table, specify
bigquery
. To output to a Cloud Storage bucket, specifyjsonl
. - DESTINATION: For
BigQuery, specify
bigqueryDestination
. For Cloud Storage, specifygcsDestination
. - OUTPUT_URI_FIELD_NAME:
For BigQuery, specify
outputUri
. For Cloud Storage, specifyoutputUriPrefix
. - OUTPUT_URI: For
BigQuery, specify the table location such as
bq://myproject.mydataset.output_result
. For Cloud Storage, specify the bucket and folder location such asgs://mybucket/path/to/outputfile
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
Request JSON body:
'{ "displayName": "JOB_NAME", "model": "publishers/anthropic/models/MODEL", "inputConfig": { "instancesFormat":"jsonl", "gcsSource":{ "uris" : "INPUT_URIS" } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } } }'
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Get the status of a batch prediction job
After you submit a batch prediction job, you can check its status to see if it has completed.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: .
- LOCATION: The region where your batch job is located.
- JOB_ID: The batch job ID that was returned when you created the job.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Retrieve batch prediction output
When a batch prediction job completes, the output is saved to the destination you specified in your request.
- For BigQuery output: The output is in the
response(JSON)
column of your destination table. - For Cloud Storage output: The output is saved as one or more JSONL files in the specified Cloud Storage location.
You can access the full batch prediction results after all rows have completed or after 24 hours, whichever comes first.