このページは Cloud Translation API によって翻訳されました。

Claude モデルを使用して予測をリクエストする

Anthropic の SDK または curl コマンドを使用すると、次のモデル名を使用して Vertex AI エンドポイントにリクエストを送信できます。

Claude Opus 4.1 の場合は claude-opus-4-1@20250805 を使用します。
Claude Opus 4 の場合は claude-opus-4@20250514 を使用します。
Claude Sonnet 4.5 の場合は claude-sonnet-4-5@20250929 を使用します。
Claude Sonnet 4 の場合は claude-sonnet-4@20250514 を使用します。
Claude 3.7 Sonnet の場合は claude-3-7-sonnet@20250219 を使用します。
Claude 3.5 Sonnet v2 の場合は claude-3-5-sonnet-v2@20241022 を使用します。
Claude Haiku 4.5 の場合は claude-haiku-4.5@20251001 を使用します。
Claude 3.5 Haiku の場合は claude-3-5-haiku@20241022 を使用します。
Claude 3.5 Sonnet の場合は claude-3-5-sonnet@20240620 を使用します。
Claude 3 Opus の場合は claude-3-opus@20240229 を使用します。
Claude 3 Haiku の場合は claude-3-haiku@20240307 を使用します。

Anthropic Claude モデルのバージョンでは、一貫した動作を保証するために、@ 記号（claude-3-7-sonnet@20250219 や claude-3-5-haiku@20241022 など）で始まる接尾辞を付ける必要があります。

始める前に

Vertex AI で Anthropic Claude モデルを使用するには、次の手順を実施する必要があります。Vertex AI を使用するには、Vertex AI API（aiplatform.googleapis.com）を有効にする必要があります。既存のプロジェクトで Vertex AI API が有効になっている場合は、新しいプロジェクトを作成する代わりに、そのプロジェクトを使用できます。

パートナーモデルを有効にして使用するために必要な権限があることを確認します。詳細については、必要な権限を付与するをご覧ください。

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

次のいずれかの Model Garden モデルカードに移動し、[有効にする] をクリックします。

Anthropic は、モデルの不正使用を記録するために、プロンプトと補完アクティビティの 30 日間のロギングを有効にすることを推奨しています。ロギングを有効にするには、リクエストとレスポンスをログに記録する方法をご確認ください。

Anthropic SDK を使用する

Anthropic Claude SDK を使用して、Anthropic Claude モデルに API リクエストを行うことができます。詳細については、以下をご覧ください。

Anthropic Vertex SDK を使用して Claude モデルにストリーミング呼び出しを行う

次のコードサンプルでは、Anthropic Vertex SDK を使用して Claude モデルにストリーミング呼び出しを行います。

Python

Vertex AI SDK for Python のインストールまたは更新の方法については、Vertex AI SDK for Python をインストールするをご覧ください。詳細については、Python API リファレンスドキュメントをご覧ください。

次のサンプルでは、リージョンエンドポイントを使用します。グローバルエンドポイントを使用するには、グローバルエンドポイントを指定するをご覧ください。

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

from anthropic import AnthropicVertex

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
result = []

with client.messages.stream(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Send me a recipe for banana bread.",
        }
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
        result.append(text)

# Example response:
# Here's a simple recipe for delicious banana bread:
# Ingredients:
# - 2-3 ripe bananas, mashed
# - 1/3 cup melted butter
# ...
# ...
# 8. Bake for 50-60 minutes, or until a toothpick inserted into the center comes out clean.
# 9. Let cool in the pan for a few minutes, then remove and cool completely on a wire rack.

Anthropic Vertex SDK を使用して Claude モデルに単一呼び出しを行う

次のコードサンプルでは、Anthropic Vertex SDK を使用して Claude モデルに単一呼び出しを行います。

Python

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

from anthropic import AnthropicVertex

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
message = client.messages.create(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Send me a recipe for banana bread.",
        }
    ],
)
print(message.model_dump_json(indent=2))
# Example response:
# {
#   "id": "msg_vrtx_0162rhgehxa9rvJM5BSVLZ9j",
#   "content": [
#     {
#       "text": "Here's a simple recipe for delicious banana bread:\n\nIngredients:\n- 2-3 ripe bananas...
#   ...

curl コマンドを使用する

curl コマンドを使用して、Vertex AI エンドポイントにリクエストを送信できます。curl コマンドで、使用するサポート対象の Claude モデルを指定します。

次のトピックでは、curl コマンドの作成方法と curl コマンドの例を示します。

REST

Vertex AI API を使用してテキストプロンプトをテストするには、パブリッシャーモデルエンドポイントに POST リクエストを送信します。

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION: Anthropic Claude モデルをサポートするリージョン。グローバルエンドポイントを使用するには、グローバルエンドポイントを指定するをご覧ください。
MODEL: 使用するモデル名。
ROLE: メッセージに関連付けられたロール。user または assistant を指定できます。最初のメッセージでは、user ロールを使用する必要があります。Claude モデルは user と assistant のターンを交互に操作します。最後のメッセージが assistant ロールを使用する場合、そのメッセージの内容の直後に回答の内容が続きます。これを使用して、モデルの回答の一部を制限できます。
STREAM: 回答がストリーミングされるかどうかを指定するブール値。レスポンスのストリーミングを行うことで、エンドユーザーが認識するレイテンシを短縮できます。レスポンスをストリーミングする場合は true、すべてのレスポンスを一度に戻すには false に設定します。
CONTENT: user または assistant のメッセージの内容（テキストなど）。
MAX_TOKENS: 回答で生成できるトークンの最大数。トークンは約 3.5 文字です。100 トークンは約 60～80 語に相当します。
回答を短くしたい場合は小さい値を、長くしたい場合は大きい値を指定します。
TOP_P（省略可）: Top-P は、モデルが出力用にトークンを選択する方法を変更します。トークンは、確率の合計が Top-P 値に等しくなるまで、確率の高いものから低いものへと選択されます。たとえば、トークン A、B、C の確率が 0.3、0.2、0.1 であり、トップ P 値が 0.5 であるとします。この場合、モデルは温度を使用して A または B を次のトークンとして選択し、C は候補から除外します。
ランダムな回答を減らしたい場合は小さい値を、ランダムな回答を増やしたい場合は大きい値を指定します。
TOP_K（省略可）: Top-K は、モデルが出力用にトークンを選択する方法を変更します。Top-K が 1 の場合、次に選択されるトークンは、モデルの語彙内のすべてのトークンで最も確率の高いものになります（グリーディデコードとも呼ばれます）。Top-K が 3 の場合は、最も確率が高い上位 3 つのトークンから温度を使用して次のトークンが選択されることになります。
トークン選択のそれぞれのステップで、最も高い確率を持つ Top-K のトークンがサンプリングされます。その後、トークンはトップ P に基づいてさらにフィルタリングされ、最終的なトークンは温度サンプリングを用いて選択されます。

ランダムな回答を減らしたい場合は小さい値を、ランダムな回答を増やしたい場合は大きい値を指定します。
TYPE: Claude 3.7 Sonnet 以降の Claude モデルで拡張思考モードを有効にするには、enabled を指定します。
BUDGET_TOKENS: 拡張思考を有効にする場合は、モデルが内部推論に使用できるトークン数を出力の一部として指定する必要があります。予算が大きいほど、複雑な問題に対するより徹底的な分析が可能になり、回答の質が向上します。1024 以上 MAX_TOKENS 未満の値を指定する必要があります。

HTTP メソッドと URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict

リクエストの本文（JSON）:

{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [
   {
    "role": "ROLE",
    "content": "CONTENT"
   }],
  "max_tokens": MAX_TOKENS,
  "stream": STREAM,
  "thinking": {
    "type": "TYPE",
    "budget_tokens": BUDGET_TOKENS
  }
}

リクエストを送信するには、次のいずれかのオプションを選択します。

curl

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict"

PowerShell

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

レスポンス

{
  "id":"msg_012NDLxqh6LsztWCU7zTb14C",
  "type":"message",
  "role":"assistant",
  "content":[{
    "type":"text",
    "text":"Hello! Nice to meet you."
  }],
  "model":"claude-2.1",
  "stop_reason":"end_turn",
  "stop_sequence":null,
  "usage":{
    "input_tokens":11,
    "output_tokens":11
  }
}

curl コマンドの例

MODEL_ID="MODEL"
LOCATION="us-central1"
PROJECT_ID="PROJECT_ID"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/anthropic/models/${MODEL_ID}:streamRawPredict -d \
'{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [{
    "role": "user",
    "content": "Hello!"
  }],
  "max_tokens": 50,
  "stream": true}'

ツールの使用（関数呼び出し）

Anthropic Claude モデルは、ツールと関数呼び出しをサポートしてモデルの機能を強化します。詳細については、Anthropic のドキュメントのツールの使用の概要をご覧ください。Vertex AI の Claude モデルはクライアントツールをサポートしていますが、サーバーツールはサポートしていません。

次のサンプルは、SDK または curl コマンドを使用してツールを使用する方法を示しています。これらのサンプルでは、サンフランシスコで営業中の近くのレストランを検索します。

Python

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'
from anthropic import AnthropicVertex

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
message = client.messages.create(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    tools=[
        {
            "name": "text_search_places_api",
            "description": "returns information about a set of places based on a string",
            "input_schema": {
                "type": "object",
                "properties": {
                    "textQuery": {
                        "type": "string",
                        "description": "The text string on which to search",
                    },
                    "priceLevels": {
                        "type": "array",
                        "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
                    },
                    "openNow": {
                        "type": "boolean",
                        "description": "whether those places are open for business.",
                    },
                },
                "required": ["textQuery"],
            },
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What are some affordable and good Italian restaurants open now in San Francisco??",
        }
    ],
)
print(message.model_dump_json(indent=2))
# Example response:
# {
#   "id": "msg_vrtx_018pk1ykbbxAYhyWUdP1bJoQ",
#   "content": [
#     {
#       "text": "To answer your question about affordable and good Italian restaurants
#       that are currently open in San Francisco....
# ...

REST

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION: Anthropic Claude モデルをサポートするリージョン。グローバルエンドポイントを使用するには、グローバルエンドポイントを指定するをご覧ください。
MODEL: 使用するモデル名。
ROLE: メッセージに関連付けられたロール。user または assistant を指定できます。最初のメッセージでは、user ロールを使用する必要があります。Claude モデルは user と assistant のターンを交互に操作します。最後のメッセージが assistant ロールを使用する場合、そのメッセージの内容の直後に回答の内容が続きます。これを使用して、モデルの回答の一部を制限できます。
STREAM: レスポンスがストリーミングされるかどうかを指定するブール値。レスポンスのストリーミングを行うことで、エンドユーザーが認識するレイテンシを短縮できます。レスポンスをストリーミングする場合は true、すべてのレスポンスを一度に戻すには false に設定します。
CONTENT: user または assistant のメッセージの内容（テキストなど）。
MAX_TOKENS: レスポンスで生成できるトークンの最大数。トークンは約 3.5 文字です。100 トークンは約 60～80 語に相当します。
回答を短くしたい場合は小さい値を、長くしたい場合は大きい値を指定します。

HTTP メソッドと URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict

リクエストの本文（JSON）:


{
  "anthropic_version": "vertex-2023-10-16",
  "max_tokens": MAX_TOKENS,
  "stream": STREAM,
  "tools": [
    {
      "name": "text_search_places_api",
      "description": "Returns information about a set of places based on a string",
      "input_schema": {
        "type": "object",
        "properties": {
          "textQuery": {
            "type": "string",
            "description": "The text string on which to search"
          },
          "priceLevels": {
            "type": "array",
            "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
          },
          "openNow": {
            "type": "boolean",
            "description": "Describes whether a place is open for business at
            the time of the query."
          },
        },
        "required": ["textQuery"]
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "What are some affordable and good Italian restaurants that are open now in San Francisco??"
    }
  ]
}

リクエストを送信するには、次のいずれかのオプションを選択します。

curl

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict"

PowerShell

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

レスポンス

{
  "id": "msg_vrtx_01ErR7VMNQdnvDt3n7Nmc4ER",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-opus-20240229",
  "content": [
    {
      "type": "text",
      "text": "\nTo find affordable and good Italian restaurants that are currently open in San Francisco, the text_search_places_api tool seems most relevant. \n\nThe required textQuery parameter can be inferred as \"Italian restaurants in San Francisco\", since the user specified Italian restaurants and the location of San Francisco.\n\nTwo optional parameters are also relevant:\nopenNow - this should be set to true, since the user specified they want restaurants open now\npriceLevels - to find affordable restaurants, this can be set to [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE]\n\nWith the textQuery provided and the two optional parameters that can help narrow the results to match the user's criteria, we have enough information to make a good call to the text_search_places_api tool to try to answer the user's request.\n"
    },
    {
      "type": "tool_use",
      "id": "toolu_vrtx_01TAJCTkxe8HhRoaQ69N4ouP",
      "name": "text_search_places_api",
      "input": {
        "textQuery": "Italian restaurants in San Francisco",
        "openNow": true,
        "priceLevels": [
          "PRICE_LEVEL_INEXPENSIVE",
          "PRICE_LEVEL_MODERATE"
        ]
      }
    }
  ],
  "stop_reason": "tool_use",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 727,
    "output_tokens": 308
  }
}

Vertex AI Studio を使用する

一部の Anthropic Claude モデルでは、 Google Cloud コンソールで Vertex AI Studio を使用して、生成 AI モデルのプロトタイプを迅速に作成し、テストできます。たとえば、Vertex AI Studio を使用して、Claude モデルのレスポンスと Google Gemini などのその他のサポートされているモデルを比較できます。

詳細については、クイックスタート: Vertex AI Studio を使用して Gemini にテキストプロンプトを送信するをご覧ください。

Anthropic Claude の割り当てとリージョンの可用性

Claude モデルにはリージョン割り当てがあり、グローバルエンドポイントをサポートするモデルにはグローバル割り当てがあります。割り当ては、1 分あたりのクエリ数（QPM）と 1 分あたりのトークン数（TPM）で指定されます。TPM には、入力トークンと出力トークンの両方が含まれます。

サービスの全体的なパフォーマンスと利用規定を維持するために、割り当て上限はアカウントによって異なる場合があります。また、アクセスが制限される場合もあります。自身のプロジェクトの割り当ては、 Google Cloud コンソールの [Quotas & Systems Limits] ページで確認してください。また、次の割り当ても使用可能である必要があります。

online_prediction_requests_per_base_model と global_online_prediction_requests_per_base_model は、QPM 割り当てを定義します。
TPM には、特定のモデルに適用される 3 つの割り当て値があります。
- 入力トークンと出力トークンを一緒にカウントするモデルの場合、online_prediction_tokens_per_minute_per_base_model と global_online_prediction_tokens_per_minute_per_base_model はモデルの TPM 割り当てを定義します。
- 入力トークンと出力トークンを別々にカウントするモデルの場合、online_prediction_input_tokens_per_minute_per_base_model と global_online_prediction_input_tokens_per_minute_per_base_model は入力 TPM 割り当てを定義し、online_prediction_output_tokens_per_minute_per_base_model と global_online_prediction_output_tokens_per_minute_per_base_model は出力 TPM 割り当てを定義します。
入力トークンと出力トークンを別々にカウントするモデルについては、モデルとリージョン別の割り当てをご覧ください。

入力トークン

以下のリストは、入力 TPM 割り当てにカウントされる入力トークンを定義しています。モデルによって、どの入力トークンがカウント対象になるかは異なります。モデルがカウントする入力トークンを確認するには、モデルとリージョン別の割り当てをご覧ください。

入力トークンには、キャッシュ読み取りトークンやキャッシュ書き込みトークンなど、すべての入力トークンが含まれます。
キャッシュに保存されていない入力トークンには、キャッシュから読み取られなかった入力トークン（キャッシュ読み取りトークン）のみが含まれます。
キャッシュ書き込みトークンには、キャッシュの作成または更新に使用されたトークンが含まれます。

モデルとリージョン別の割り当て

次の表に、各リージョンの各モデルのデフォルトの割り当てとサポートされているコンテキストの長さを示します。

モデル	リージョン	割り当て	コンテキストの長さ
Claude Opus 4.1
	`us-east5`	QPM: 25 入力 TPM: 60,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 6,000	200,000
	`global endpoint`	QPM: 25 入力 TPM: 60,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 6,000	200,000
Claude Opus 4
	`us-east5`	QPM: 25 入力 TPM: 60,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 6,000	200,000
	`global endpoint`	QPM: 25 入力 TPM: 60,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 6,000	200,000
Claude 4.5 Sonnet
	`us-east5`	QPM: 1,500 入力 TPM: 1,500,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 150,000	1,000,000（ベータ版）、200,000（一般提供版）
	`europe-west1`	QPM: 1,800 入力 TPM: 1,800,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 180,000	1,000,000（ベータ版）、200,000（一般提供版）
	`asia-southeast1`	QPM: 1,500 入力 TPM: 1,500,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 150,000	1,000,000（ベータ版）、200,000（一般提供版）
	`global endpoint`	QPM: 1,500 入力 TPM: 1,500,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 150,000	1,000,000（ベータ版）、200,000（一般提供版）
Claude Sonnet 4
	`us-east5`	QPM: 35 入力 TPM: 280,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 20,000	1,000,000
	`europe-west1`	QPM: 25 入力 TPM: 180,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 20,000	1,000,000
	`asia-east1`	QPM: 70 入力 TPM: 550,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 50,000	1,000,000
	`global endpoint`	QPM: 35 入力 TPM: 276,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 24,000	1,000,000
Claude 3.7 Sonnet
	`us-east5`	QPM: 55 TPM: 500,000（キャッシュに保存されていない入力と出力）	200,000
	`europe-west1`	QPM: 40 TPM: 300,000（キャッシュに保存されていない入力と出力）	200,000
	`global endpoint`	QPM: 35 TPM: 300,000（キャッシュに保存されていない入力と出力）	200,000
Claude 3.5 Sonnet v2
	`us-east5`	QPM: 90 TPM: 540,000（入力と出力）	200,000
	`europe-west1`	QPM: 55 TPM: 330,000（入力と出力）	200,000
	`global endpoint`	QPM: 25 TPM: 140,000（入力と出力）	200,000
Claude Haiku 4.5
	`us-east5`	QPM: 1,500 入力 TPM: 1,500,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 150,000	200,000
	`europe-west1`	QPM: 1,800 入力 TPM: 1,800,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 180,000	200,000
	`asia-east1`	QPM: 2,500 入力 TPM: 2,500,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 250,000	200,000
	`global endpoint`	QPM: 2,500 入力 TPM: 2,500,000 キャッシュ未使用およびキャッシュ書き込み出力 TPM: 250,000	200,000
Claude 3.5 Haiku
	`us-east5`	QPM: 80 TPM: 350,000（入力と出力）	200,000
	`europe-west1`	QPM: 90 TPM: 400,000（入力と出力）	200,000
Claude 3.5 Sonnet
	`us-east5`	QPM: 80 TPM: 350,000（入力と出力）	200,000
	`europe-west1`	QPM: 130 TPM: 600,000（入力と出力）	200,000
	`asia-southeast1`	QPM: 35 TPM: 150,000（入力と出力）	200,000
Claude 3 Opus
Claude 3 Opus	`us-east5`	QPM: 20 TPM: 105,000（入力と出力）	200,000
Claude 3 Haiku
	`us-east5`	QPM: 245 TPM: 600,000（入力と出力）	200,000
	`europe-west1`	QPM: 75 TPM: 181,000（入力と出力）	200,000
	`asia-southeast1`	QPM: 70 TPM: 174,000（入力と出力）	200,000

Vertex AI の生成 AI の割り当てを引き上げる場合は、 Google Cloud コンソールで割り当ての引き上げをリクエストできます。割り当ての詳細については、Cloud Quotas の概要をご覧ください。