You can use Anthropic's SDK or curl commands to send requests to the
Vertex AI endpoint using the following model names:
Anthropic Claude model versions must be used with a suffix that starts with an
To use the Anthropic Claude models with Vertex AI, you must perform the
following steps. The Vertex AI API ( Make sure you have the required permissions to enable and use partner models.
For more information, see Grant the required permissions. In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
Anthropic recommends that you enable 30-day logging of your prompt and
completion activity to record any model misuse. To enable logging, see Log
requests and responses. You can make API requests to the Anthropic Claude models using the Anthropic
Claude SDK. To learn more, see the
following: The following code sample uses the Anthropic Vertex SDK to perform a streaming
call to a Claude model. To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
For more information, see the
Python API reference documentation.
The following code sample uses the Anthropic Vertex SDK to perform a unary call
to a Claude model. To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
For more information, see the
Python API reference documentation.
You can use a curl command to make a request to the Vertex AI endpoint.
The curl command specifies which supported Claude model you want to use.
Anthropic Claude model versions must be used with a suffix that starts with an
The following topic shows you how to create a curl command and includes a sample
curl command. To test a text prompt by using the Vertex AI API, send a POST request to the
publisher model endpoint.
Before using any of the request data,
make the following replacements:
Specify a lower value for shorter responses and a higher value for potentially longer
responses. Specify a lower value for less random responses and a higher value for more
random responses. For each token selection step, the top-K tokens with the highest
probabilities are sampled. Then tokens are further filtered based on top-P with
the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more
random responses.
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following. The Anthropic Claude models support tools and function calling to enhance a
model's capabilities. For more information, see the
Tool use overview in the Anthropic documentation. The following samples demonstrate how to use tools by using an SDK or curl
command. The samples search for nearby restaurants in San Francisco that are
open. To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
For more information, see the
Python API reference documentation.
Before using any of the request data,
make the following replacements:
Specify a lower value for shorter responses and a higher value for potentially longer
responses.
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following.
claude-opus-4-1@20250805
claude-opus-4@20250514
claude-sonnet-4@20250514
claude-3-7-sonnet@20250219
claude-3-5-sonnet-v2@20241022
claude-3-5-haiku@20241022
claude-3-5-sonnet@20240620
claude-3-opus@20240229
claude-3-haiku@20240307
@
symbol (such as claude-3-7-sonnet@20250219
or
claude-3-5-haiku@20241022
) to guarantee consistent behavior.
Before you begin
aiplatform.googleapis.com
) must
be enabled to use Vertex AI. If you already have an existing project with
the Vertex AI API enabled, you can use that project instead of creating
a new project.
Use the Anthropic SDK
Make a streaming call to a Claude model using the Anthropic Vertex SDK
Python
Make a unary call to a Claude model using the Anthropic Vertex SDK
Python
Use a curl command
@
symbol (such as claude-3-7-sonnet@20250219
or
claude-3-5-haiku@20241022
) to guarantee consistent behavior.
REST
user
or an assistant
.
The first message must use the user
role. Claude models
operate with alternating user
and assistant
turns.
If the final message uses the assistant
role, then the response
content continues immediately from the content in that message. You can use
this to constrain part of the model's response.true
to stream the response and false
to return the response all at
once.user
or
assistant
message.0.5
, then the model will
select either A or B as the next token by using temperature and excludes C as a
candidate.
1
means the next selected token is the most probable among all
tokens in the model's vocabulary (also called greedy decoding), while a top-K of
3
means that the next token is selected from among the three most
probable tokens by using temperature.
enabled
.1024
but less than MAX_TOKENS
.
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict
{
"anthropic_version": "vertex-2023-10-16",
"messages": [
{
"role": "ROLE",
"content": "CONTENT"
}],
"max_tokens": MAX_TOKENS,
"stream": STREAM,
"thinking": {
"type": "TYPE",
"budget_tokens": BUDGET_TOKENS
}
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict" | Select-Object -Expand ContentExample curl command
MODEL_ID="MODEL"
LOCATION="us-central1"
PROJECT_ID="PROJECT_ID"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/anthropic/models/${MODEL_ID}:streamRawPredict -d \
'{
"anthropic_version": "vertex-2023-10-16",
"messages": [{
"role": "user",
"content": "Hello!"
}],
"max_tokens": 50,
"stream": true}'
Tool use (function calling)
Python
REST
The following sample uses regional endpoints. To use the global endpoint, see Specify
the global endpoint.
user
or an assistant
.
The first message must use the user
role. Claude models
operate with alternating user
and assistant
turns.
If the final message uses the assistant
role, then the response
content continues immediately from the content in that message. You can use
this to constrain part of the model's response.true
to stream the response
and false
to return the response all at once.user
or assistant
message.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict
{
"anthropic_version": "vertex-2023-10-16",
"max_tokens": MAX_TOKENS,
"stream": STREAM,
"tools": [
{
"name": "text_search_places_api",
"description": "Returns information about a set of places based on a string",
"input_schema": {
"type": "object",
"properties": {
"textQuery": {
"type": "string",
"description": "The text string on which to search"
},
"priceLevels": {
"type": "array",
"description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
},
"openNow": {
"type": "boolean",
"description": "Describes whether a place is open for business at
the time of the query."
},
},
"required": ["textQuery"]
}
}
],
"messages": [
{
"role": "user",
"content": "What are some affordable and good Italian restaurants that are open now in San Francisco??"
}
]
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict" | Select-Object -Expand Content
Use Vertex AI Studio
For some of the Anthropic Claude models, you can use Vertex AI Studio to quickly prototype and test generative AI models in the Google Cloud console. As an example, you can use Vertex AI Studio to compare Claude model responses with other supported models such as Google Gemini.
For more information, see Quickstart: Send text prompts to Gemini using Vertex AI Studio.
Anthropic Claude quotas and region availability
Claude models have regional quotas and, for models that support a global endpoint, a global quota. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.
To maintain overall service performance and acceptable use, the maximum quotas might vary by account and, in some cases, access might be restricted. View your project's quotas on the Quotas & Systems Limits page in the Google Cloud console. You must also have the following quotas available:
online_prediction_requests_per_base_model
andglobal_online_prediction_requests_per_base_model
defines your QPM quota.For TPM, there are three quota values that apply to particular models:
- For models that count input and output tokens together,
online_prediction_tokens_per_minute_per_base_model
andglobal_online_prediction_tokens_per_minute_per_base_model
defines the model TPM quota. - For models that count input and output tokens separately,
online_prediction_input_tokens_per_minute_per_base_model
andglobal_online_prediction_input_tokens_per_minute_per_base_model
defines the input TPM quota andonline_prediction_output_tokens_per_minute_per_base_model
andglobal_online_prediction_output_tokens_per_minute_per_base_model
defines the output TPM quota.
To see which models count input and output tokens separately, see Quotas by model and region.
- For models that count input and output tokens together,
Input tokens
The following list defines the input tokens that can count towards your input TPM quota. The input tokens that each model counts can vary. To see which input tokens a model counts, see Quotas by model and region.
- Input tokens includes all input tokens, including cache read and cache write tokens.
- Uncached input tokens includes only the input tokens that weren't read from a cache (cache read tokens).
- Cache write tokens includes tokens that were used to create or update a cache.
Quotas by model and region
The following table shows the default quotas and supported context length for each model in each region.
Model | Region | Quotas | Context length |
---|---|---|---|
Claude Opus 4.1 | |||
us-east5 |
|
200,000 | |
global endpoint |
|
200,000 | |
Claude Opus 4 | |||
us-east5 |
|
200,000 | |
global endpoint |
|
200,000 | |
Claude Sonnet 4 | |||
us-east5 |
|
1,000,000 | |
europe-west1 |
|
1,000,000 | |
asia-east1 |
|
1,000,000 | |
global endpoint |
|
1,000,000 | |
Claude 3.7 Sonnet | |||
us-east5 |
|
200,000 | |
europe-west1 |
|
200,000 | |
global endpoint |
|
200,000 | |
Claude 3.5 Sonnet v2 | |||
us-east5 |
|
200,000 | |
europe-west1 |
|
200,000 | |
global endpoint |
|
200,000 | |
Claude 3.5 Haiku | |||
us-east5 |
|
200,000 | |
europe-west1 |
|
200,000 | |
Claude 3.5 Sonnet | |||
us-east5 |
|
200,000 | |
europe-west1 |
|
200,000 | |
asia-southeast1 |
|
200,000 | |
Claude 3 Opus | |||
us-east5 |
|
200,000 | |
Claude 3 Haiku | |||
us-east5 |
|
200,000 | |
europe-west1 |
|
200,000 | |
asia-southeast1 |
|
200,000 |
If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.