Batch predictions with Anthropic Claude models

You can use batch predictions to send a large number of non-latency-sensitive prompts to an Anthropic Claude model in a single request. Unlike online predictions, which process one prompt per request, batch predictions are ideal for large-scale, offline jobs.

This page shows you how to get batch predictions from Anthropic Claude models and covers the following topics:

Supported Anthropic Claude models

Vertex AI supports batch predictions for the following Anthropic Claude models:

Quotas: By default, you can make up to four concurrent batch requests in a single project.

Prepare your input data

You can provide batch input data using a BigQuery table or a JSON Lines (JSONL) file stored in Cloud Storage.

Input Source Description Use Case
BigQuery Input data is stored in a BigQuery table with specific columns for the request ID and prompt. Ideal when your source data already resides in BigQuery or when you need to perform complex data preparation and transformation using SQL.
Cloud Storage (JSONL) Input data is a JSONL file where each line is a separate JSON request object. Best for when your data is generated as individual records, exported from systems that produce JSON, or when you prefer a simple, file-based workflow.

The input for both sources must follow the Anthropic Claude API Schema JSON format, as shown in the following example:

{
  "custom_id": "request-1",
  "request":  {
    "messages": [{"role": "user", "content": "Hello!"}],
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 50
  }
}

BigQuery

Your BigQuery input table must adhere to the following schema:

Column name Description
custom_id An ID for each request to match the input with the output.
request The request body, which is your input prompt and must follow the Anthropic Claude API Schema.
  • Your input table can have other columns, which are ignored by the batch job.
  • Batch prediction jobs reserve two column names for the output: response(JSON) and status. Don't use these column names in your input table.

Cloud Storage

For Cloud Storage, the input must be a JSONL file located in a Cloud Storage bucket. Each line of the file is a separate JSON request object.

Request a batch prediction

After you prepare your input data, you can request a batch prediction. You can use input from either BigQuery or Cloud Storage and write the output to either a BigQuery table or a JSONL file in a Cloud Storage bucket.

From BigQuery

Specify your BigQuery input table, model, and output location. The batch prediction job and your table must be in the same region.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import time

from google import genai
from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# TODO(developer): Update and un-comment below line
# output_uri = f"bq://your-project.your_dataset.your_table"

job = client.batches.create(
    # Check Anthropic Claude region availability in https://cloud.devsite.corp.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#regions
    # More about Anthropic model: https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-5-haiku
    model="publishers/anthropic/models/claude-3-5-haiku",
    # The source dataset needs to be created specifically in us-east5
    src="bq://python-docs-samples-tests.anthropic_bq_sample.test_data",
    config=CreateBatchJobConfig(dest=output_uri),
)
print(f"Job name: {job.name}")
print(f"Job state: {job.state}")
# Example response:
# Job name: projects/%PROJECT_ID%/locations/us-central1/batchPredictionJobs/9876453210000000000
# Job state: JOB_STATE_PENDING

# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJob
completed_states = {
    JobState.JOB_STATE_SUCCEEDED,
    JobState.JOB_STATE_FAILED,
    JobState.JOB_STATE_CANCELLED,
    JobState.JOB_STATE_PAUSED,
}

while job.state not in completed_states:
    time.sleep(30)
    job = client.batches.get(name=job.name)
    print(f"Job state: {job.state}")
# Example response:
# Job state: JOB_STATE_PENDING
# Job state: JOB_STATE_RUNNING
# Job state: JOB_STATE_RUNNING
# ...
# Job state: JOB_STATE_SUCCEEDED

REST

Before using any of the request data, make the following replacements:

  • LOCATION: A region that supports the selected Anthropic Claude model (see Claude Regions).
  • PROJECT_ID: Your project ID.
  • MODEL: The name of the model.
  • INPUT_URI: The BigQuery table where your batch prediction input is located such as bq://myproject.mydataset.input_table.
  • OUTPUT_FORMAT: To output to a BigQuery table, specify bigquery. To output to a Cloud Storage bucket, specify jsonl.
  • DESTINATION: For BigQuery, specify bigqueryDestination. For Cloud Storage, specify gcsDestination.
  • OUTPUT_URI_FIELD_NAME: For BigQuery, specify outputUri. For Cloud Storage, specify outputUriPrefix.
  • OUTPUT_URI: For BigQuery, specify the table location such as bq://myproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such as gs://mybucket/path/to/outputfile.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

'{
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  }
}'

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

From Cloud Storage

Specify your JSONL file's Cloud Storage location, model, and output location.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import time

from google import genai
from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))
# TODO(developer): Update and un-comment below line
# output_uri = "gs://your-bucket/your-prefix"

# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create
job = client.batches.create(
    # More about Anthropic model: https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-5-haiku
    model="publishers/anthropic/models/claude-3-5-haiku",
    # Source link: https://storage.cloud.google.com/cloud-samples-data/batch/anthropic-test-data-gcs.jsonl
    src="gs://cloud-samples-data/anthropic-test-data-gcs.jsonl",
    config=CreateBatchJobConfig(dest=output_uri),
)
print(f"Job name: {job.name}")
print(f"Job state: {job.state}")
# Example response:
# Job name: projects/%PROJECT_ID%/locations/us-central1/batchPredictionJobs/9876453210000000000
# Job state: JOB_STATE_PENDING

# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJob
completed_states = {
    JobState.JOB_STATE_SUCCEEDED,
    JobState.JOB_STATE_FAILED,
    JobState.JOB_STATE_CANCELLED,
    JobState.JOB_STATE_PAUSED,
}

while job.state not in completed_states:
    time.sleep(30)
    job = client.batches.get(name=job.name)
    print(f"Job state: {job.state}")
# Example response:
# Job state: JOB_STATE_PENDING
# Job state: JOB_STATE_RUNNING
# Job state: JOB_STATE_RUNNING
# ...
# Job state: JOB_STATE_SUCCEEDED

REST

Before using any of the request data, make the following replacements:

  • LOCATION: A region that supports the selected Anthropic Claude model (see Claude Regions).
  • PROJECT_ID: .
  • MODEL: The name of the model.
  • INPUT_URIS: A comma-separated list of the Cloud Storage locations of your JSONL batch prediction input such as gs://bucketname/path/to/jsonl.
  • OUTPUT_FORMAT: To output to a BigQuery table, specify bigquery. To output to a Cloud Storage bucket, specify jsonl.
  • DESTINATION: For BigQuery, specify bigqueryDestination. For Cloud Storage, specify gcsDestination.
  • OUTPUT_URI_FIELD_NAME: For BigQuery, specify outputUri. For Cloud Storage, specify outputUriPrefix.
  • OUTPUT_URI: For BigQuery, specify the table location such as bq://myproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such as gs://mybucket/path/to/outputfile.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

'{
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat":"jsonl",
    "gcsSource":{
      "uris" : "INPUT_URIS"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  }
}'

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Get the status of a batch prediction job

After you submit a batch prediction job, you can check its status to see if it has completed.

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID: .
  • LOCATION: The region where your batch job is located.
  • JOB_ID: The batch job ID that was returned when you created the job.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Retrieve batch prediction output

When a batch prediction job completes, the output is saved to the destination you specified in your request.

  • For BigQuery output: The output is in the response(JSON) column of your destination table.
  • For Cloud Storage output: The output is saved as one or more JSONL files in the specified Cloud Storage location.

You can access the full batch prediction results after all rows have completed or after 24 hours, whichever comes first.