이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Claude 모델을 사용하여 예측 요청

Anthropic SDK 또는 curl 명령어를 사용하여 다음 모델 이름으로 Vertex AI 엔드포인트에 요청을 보낼 수 있습니다.

Claude Opus 4.1의 경우 claude-opus-4-1@20250805 사용
Claude Opus 4의 경우 claude-opus-4@20250514 사용
Claude Sonnet 4.5의 경우 claude-sonnet-4-5@20250929 사용
Claude Sonnet 4의 경우 claude-sonnet-4@20250514 사용
Claude 3.7 Sonnet의 경우 claude-3-7-sonnet@20250219 사용
Claude 3.5 Sonnet v2의 경우 claude-3-5-sonnet-v2@20241022 사용
Claude Haiku 4.5의 경우 claude-haiku-4.5@20251001 사용
Claude 3.5 Haiku의 경우 claude-3-5-haiku@20241022 사용
Claude 3.5 Sonnet의 경우 claude-3-5-sonnet@20240620 사용
Claude 3 Opus의 경우 claude-3-opus@20240229 사용
Claude 3 Haiku의 경우 claude-3-haiku@20240307 사용

Anthropic Claude 모델 버전은 일관된 동작을 보장하기 위해 @ 기호(예: claude-3-7-sonnet@20250219 또는 claude-3-5-haiku@20241022)로 시작하는 서픽스와 함께 사용해야 합니다.

시작하기 전에

Vertex AI에서 Anthropic Claude 모델을 사용하려면 다음 단계를 실행해야 합니다. Vertex AI를 사용하려면 Vertex AI API(aiplatform.googleapis.com)를 사용 설정해야 합니다. Vertex AI API가 사용 설정된 기존 프로젝트가 이미 있는 경우 새 프로젝트를 만드는 대신 해당 프로젝트를 사용할 수 있습니다.

파트너 모델을 사용 설정하고 사용하는 데 필요한 권한이 있는지 확인합니다. 자세한 내용은 필수 권한 부여를 참조하세요.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

다음 Model Garden 모델 카드 중 하나로 이동한 다음 사용 설정을 클릭합니다.

Anthropic에서는 모델 오용이 기록되도록 프롬프트 및 완료 활동 30일 로깅을 사용 설정하는 것이 좋습니다. 로깅을 사용 설정하려면 요청 및 응답 로깅을 참조하세요.

Anthropic SDK 사용

Anthropic Claude SDK를 사용하여 Anthropic Claude 모델에 API를 요청할 수 있습니다. 자세한 내용은 다음을 참조하세요.

Anthropic Vertex SDK를 사용하여 Claude 모델에 스트리밍 호출 수행

다음 코드 샘플에서는 Anthropic Vertex SDK를 사용하여 Claude 모델에 스트리밍 호출을 수행합니다.

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

다음 샘플에서는 리전 엔드포인트를 사용합니다. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

from anthropic import AnthropicVertex

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
result = []

with client.messages.stream(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Send me a recipe for banana bread.",
        }
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
        result.append(text)

# Example response:
# Here's a simple recipe for delicious banana bread:
# Ingredients:
# - 2-3 ripe bananas, mashed
# - 1/3 cup melted butter
# ...
# ...
# 8. Bake for 50-60 minutes, or until a toothpick inserted into the center comes out clean.
# 9. Let cool in the pan for a few minutes, then remove and cool completely on a wire rack.

Anthropic Vertex SDK를 사용하여 Claude 모델에 단항 호출 수행

다음 코드 샘플에서는 Anthropic Vertex SDK를 사용하여 Claude 모델에 단항 호출을 수행합니다.

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

다음 샘플에서는 리전 엔드포인트를 사용합니다. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

from anthropic import AnthropicVertex

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
message = client.messages.create(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Send me a recipe for banana bread.",
        }
    ],
)
print(message.model_dump_json(indent=2))
# Example response:
# {
#   "id": "msg_vrtx_0162rhgehxa9rvJM5BSVLZ9j",
#   "content": [
#     {
#       "text": "Here's a simple recipe for delicious banana bread:\n\nIngredients:\n- 2-3 ripe bananas...
#   ...

curl 명령어 사용

curl 명령어를 사용하여 Vertex AI 엔드포인트에 요청할 수 있습니다. curl 명령어는 사용할 지원되는 Claude 모델을 지정합니다.

다음 주제에서는 curl 명령어를 만드는 방법과 샘플 curl 명령어에 대해 소개합니다.

REST

Vertex AI API를 사용하여 텍스트 프롬프트를 테스트하려면 POST 요청을 게시자 모델 엔드포인트로 전송합니다.

다음 샘플에서는 리전 엔드포인트를 사용합니다. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION: Anthropic Claude 모델을 지원하는 리전. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.
MODEL: 사용할 모델 이름.
ROLE: 메시지와 연결된 역할. user 또는 assistant를 지정할 수 있습니다. 첫 번째 메시지는 user 역할을 사용해야 합니다. Claude 모델이 user와 assistant의 턴을 번갈아가며 작동합니다. 최종 메시지에서 assistant 역할을 사용하는 경우 이 메시지의 콘텐츠에서 곧바로 응답 콘텐츠가 계속됩니다. 이를 사용하여 모델 응답의 일부를 제한할 수 있습니다.
STREAM: 응답 스트리밍 여부를 지정하는 불리언. 응답을 스트리밍하여 최종 사용자 지연 시간 인식을 줄입니다. 응답을 스트리밍하려면 true로 설정하고 응답을 한 번에 반환하려면 false로 설정합니다.
CONTENT: user 또는 assistant 메시지의 콘텐츠(예: 텍스트)
MAX_TOKENS: 응답에서 생성될 수 있는 토큰의 최대 개수. 토큰은 약 3.5자(영문 기준)입니다. 토큰 100개는 단어 약 60~80개에 해당합니다.
응답이 짧을수록 낮은 값을 지정하고 잠재적으로 응답이 길면 높은 값을 지정합니다.
TOP_P: (선택사항) Top-P는 모델이 출력용 토큰을 선택하는 방식을 변경합니다. 토큰은 확률의 합이 Top-P 값과 같아질 때까지 확률이 가장 높은 것부터 가장 낮은 것까지 선택됩니다. 예를 들어 토큰 A, B, C의 확률이 0.3, 0.2, 0.1이고 Top-P 값이 0.5이면 모델이 온도를 사용하여 다음 토큰으로 A 또는 B를 선택하고 C를 후보에서 제외합니다.
임의성이 낮은 응답에 낮은 값을 지정하고 임의성이 높은 응답에 높은 값을 지정합니다.
TOP_K: (선택사항) Top-K는 모델이 출력용 토큰을 선택하는 방식을 변경합니다. Top-K가 1이면 선택된 토큰이 모델의 어휘에 포함된 모든 토큰 중에서 가장 확률이 높다는 의미입니다(그리디 디코딩이라고도 함). 반면에 Top-K가 3이면 강도를 사용하여 가장 확률이 높은 3개 토큰 중에서 다음 토큰이 선택된다는 의미입니다.
각 토큰 선택 단계에서 확률이 가장 높은 Top-K 토큰이 샘플링됩니다. 그런 다음 Top-P를 기준으로 토큰을 추가로 필터링하고 온도 샘플링을 사용하여 최종 토큰을 선택합니다.

임의성이 낮은 응답에 낮은 값을 지정하고 임의성이 높은 응답에 높은 값을 지정합니다.
TYPE: Claude 3.7 Sonnet 및 이후 Claude 모델의 경우 확장된 사고 모드를 사용 설정하려면 enabled을 지정합니다.
BUDGET_TOKENS: 확장된 사고를 사용 설정하는 경우 모델이 내부 추론에 사용할 수 있는 토큰 수를 출력의 일부로 지정해야 합니다. 예산이 많을수록 복잡한 문제를 더 철저하게 분석하고 응답 품질을 향상시킬 수 있습니다. 1024 이상의 MAX_TOKENS 이하인 값을 지정해야 합니다.

HTTP 메서드 및 URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict

JSON 요청 본문:

{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [
   {
    "role": "ROLE",
    "content": "CONTENT"
   }],
  "max_tokens": MAX_TOKENS,
  "stream": STREAM,
  "thinking": {
    "type": "TYPE",
    "budget_tokens": BUDGET_TOKENS
  }
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict"

PowerShell

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 수신됩니다.

응답

{
  "id":"msg_012NDLxqh6LsztWCU7zTb14C",
  "type":"message",
  "role":"assistant",
  "content":[{
    "type":"text",
    "text":"Hello! Nice to meet you."
  }],
  "model":"claude-2.1",
  "stop_reason":"end_turn",
  "stop_sequence":null,
  "usage":{
    "input_tokens":11,
    "output_tokens":11
  }
}

curl 명령어 예시

MODEL_ID="MODEL"
LOCATION="us-central1"
PROJECT_ID="PROJECT_ID"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/anthropic/models/${MODEL_ID}:streamRawPredict -d \
'{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [{
    "role": "user",
    "content": "Hello!"
  }],
  "max_tokens": 50,
  "stream": true}'

도구 사용(함수 호출)

Anthropic Claude 모델은 모델의 기능을 향상시키기 위한 도구와 함수 호출을 지원합니다. 자세한 내용은 Anthropic 문서의 도구 사용 개요를 참조하세요. Vertex AI의 Claude 모델은 클라이언트 도구를 지원하지만 서버 도구는 지원하지 않습니다.

다음 샘플에서는 SDK 또는 curl 명령어를 사용하여 도구를 사용하는 방법을 보여줍니다. 샘플에서는 영업 중인 샌프란시스코 인근 레스토랑을 검색합니다.

Python

Vertex AI SDK for Python을 설치하거나 업데이트하는 방법은 Vertex AI SDK for Python 설치를 참조하세요. 자세한 내용은 Python API 참고 문서를 참조하세요.

다음 샘플에서는 리전 엔드포인트를 사용합니다. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'
from anthropic import AnthropicVertex

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
message = client.messages.create(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    tools=[
        {
            "name": "text_search_places_api",
            "description": "returns information about a set of places based on a string",
            "input_schema": {
                "type": "object",
                "properties": {
                    "textQuery": {
                        "type": "string",
                        "description": "The text string on which to search",
                    },
                    "priceLevels": {
                        "type": "array",
                        "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
                    },
                    "openNow": {
                        "type": "boolean",
                        "description": "whether those places are open for business.",
                    },
                },
                "required": ["textQuery"],
            },
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What are some affordable and good Italian restaurants open now in San Francisco??",
        }
    ],
)
print(message.model_dump_json(indent=2))
# Example response:
# {
#   "id": "msg_vrtx_018pk1ykbbxAYhyWUdP1bJoQ",
#   "content": [
#     {
#       "text": "To answer your question about affordable and good Italian restaurants
#       that are currently open in San Francisco....
# ...

REST

다음 샘플에서는 리전 엔드포인트를 사용합니다. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION: Anthropic Claude 모델을 지원하는 리전. 전역 엔드포인트를 사용하려면 전역 엔드포인트 지정을 참조하세요.
MODEL: 사용할 모델 이름
ROLE: 메시지와 연결된 역할. user 또는 assistant를 지정할 수 있습니다. 첫 번째 메시지는 user 역할을 사용해야 합니다. Claude 모델이 user와 assistant의 턴을 번갈아가며 작동합니다. 최종 메시지에서 assistant 역할을 사용하는 경우 이 메시지의 콘텐츠에서 곧바로 응답 콘텐츠가 계속됩니다. 이를 사용하여 모델 응답의 일부를 제한할 수 있습니다.
STREAM: 응답 스트리밍 여부를 지정하는 불리언. 응답을 스트리밍하여 최종 사용자 지연 시간 인식을 줄입니다. 응답을 스트리밍하려면 true로 설정하고 응답을 한 번에 반환하려면 false로 설정합니다.
CONTENT: user 또는 assistant 메시지의 콘텐츠(예: 텍스트).
MAX_TOKENS: 응답에서 생성될 수 있는 토큰의 최대 개수. 토큰은 약 3.5자(영문 기준)입니다. 토큰 100개는 단어 약 60~80개에 해당합니다.
응답이 짧을수록 낮은 값을 지정하고 잠재적으로 응답이 길면 높은 값을 지정합니다.

HTTP 메서드 및 URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict

JSON 요청 본문:


{
  "anthropic_version": "vertex-2023-10-16",
  "max_tokens": MAX_TOKENS,
  "stream": STREAM,
  "tools": [
    {
      "name": "text_search_places_api",
      "description": "Returns information about a set of places based on a string",
      "input_schema": {
        "type": "object",
        "properties": {
          "textQuery": {
            "type": "string",
            "description": "The text string on which to search"
          },
          "priceLevels": {
            "type": "array",
            "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
          },
          "openNow": {
            "type": "boolean",
            "description": "Describes whether a place is open for business at
            the time of the query."
          },
        },
        "required": ["textQuery"]
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "What are some affordable and good Italian restaurants that are open now in San Francisco??"
    }
  ]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 수신됩니다.

응답

{
  "id": "msg_vrtx_01ErR7VMNQdnvDt3n7Nmc4ER",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-opus-20240229",
  "content": [
    {
      "type": "text",
      "text": "\nTo find affordable and good Italian restaurants that are currently open in San Francisco, the text_search_places_api tool seems most relevant. \n\nThe required textQuery parameter can be inferred as \"Italian restaurants in San Francisco\", since the user specified Italian restaurants and the location of San Francisco.\n\nTwo optional parameters are also relevant:\nopenNow - this should be set to true, since the user specified they want restaurants open now\npriceLevels - to find affordable restaurants, this can be set to [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE]\n\nWith the textQuery provided and the two optional parameters that can help narrow the results to match the user's criteria, we have enough information to make a good call to the text_search_places_api tool to try to answer the user's request.\n"
    },
    {
      "type": "tool_use",
      "id": "toolu_vrtx_01TAJCTkxe8HhRoaQ69N4ouP",
      "name": "text_search_places_api",
      "input": {
        "textQuery": "Italian restaurants in San Francisco",
        "openNow": true,
        "priceLevels": [
          "PRICE_LEVEL_INEXPENSIVE",
          "PRICE_LEVEL_MODERATE"
        ]
      }
    }
  ],
  "stop_reason": "tool_use",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 727,
    "output_tokens": 308
  }
}

Vertex AI Studio 사용

일부 Anthropic Claude 모델의 경우 Vertex AI Studio를 사용하여 Google Cloud 콘솔에서 빠르게 생성형 AI 모델 프로토타입을 제작하고 이 모델을 테스트할 수 있습니다. 예를 들어 Vertex AI Studio를 사용하여 Claude 모델 응답을 Google Gemini와 같은 지원되는 다른 모델과 비교할 수 있습니다.

자세한 내용은 빠른 시작: Vertex AI Studio를 사용하여 Gemini에 텍스트 프롬프트 보내기를 참조하세요.

Anthropic Claude 할당량 및 리전 가용성

Claude 모델에는 리전 할당량이 있으며 전역 엔드포인트를 지원하는 모델에는 전역 할당량이 있습니다. 할당량은 분당 쿼리 수(QPM) 및 분당 토큰 수(TPM)로 지정됩니다. TPM에는 입력 및 출력 토큰이 모두 포함됩니다.

전반적인 서비스 성능과 서비스이용(정책)을 유지하기 위해 최대 할당량은 계정에 따라 다를 수 있으며 경우에 따라 액세스가 제한될 수도 있습니다. Google Cloud 콘솔의 할당량 및 시스템 한도 페이지에서 프로젝트 할당량을 봅니다. 또한 다음 할당량을 사용할 수 있어야 합니다.

online_prediction_requests_per_base_model 및 global_online_prediction_requests_per_base_model은 QPM 할당량을 정의합니다.
TPM의 경우 특정 모델에 적용되는 할당량 값 3개가 있습니다.
- 입력 및 출력 토큰을 함께 집계하는 모델의 경우 online_prediction_tokens_per_minute_per_base_model 및 global_online_prediction_tokens_per_minute_per_base_model에서 모델 TPM 할당량을 정의합니다.
- 입력 토큰과 출력 토큰을 별도로 집계하는 모델의 경우 online_prediction_input_tokens_per_minute_per_base_model 및 global_online_prediction_input_tokens_per_minute_per_base_model에서 입력 TPM 할당량을, online_prediction_output_tokens_per_minute_per_base_model 및 global_online_prediction_output_tokens_per_minute_per_base_model에서 출력 TPM 할당량을 정의합니다.
입력 토큰과 출력 토큰을 별도로 집계하는 모델을 확인하려면 모델 및 리전별 할당량을 참조하세요.

입력 토큰

다음 목록은 입력 TPM 할당량을 집계할 수 있는 입력 토큰을 정의합니다. 각 모델에서 집계하는 입력 토큰은 다를 수 있습니다. 모델에서 집계하는 입력 토큰을 확인하려면 모델 및 리전별 할당량을 참조하세요.

입력 토큰에는 캐시 읽기 및 캐시 쓰기 토큰을 포함한 모든 입력 토큰이 포함됩니다.
캐시되지 않은 입력 토큰에는 캐시에서 읽지 않은 입력 토큰(캐시 읽기 토큰)만 포함됩니다.
캐시 쓰기 토큰에는 캐시를 만들거나 업데이트하는 데 사용된 토큰이 포함됩니다.

모델 및 리전별 할당량

다음 표에는 각 리전의 모델별 기본 할당량과 지원되는 컨텍스트 길이가 나와 있습니다.

모델	지역	할당량	컨텍스트 길이
Claude Opus 4.1
	`us-east5`	QPM: 25개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 60,000개 출력 TPM: 6,000개	200,000
	`global endpoint`	QPM: 25개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 60,000개 출력 TPM: 6,000개	200,000
Claude Opus 4
	`us-east5`	QPM: 25개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 60,000개 출력 TPM: 6,000개	200,000
	`global endpoint`	QPM: 25개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 60,000개 출력 TPM: 6,000개	200,000
Claude Sonnet 4.5
	`us-east5`	QPM: 1,500개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 1,500,000개 출력 TPM: 150,000개	1,000,000 (베타), 200,000 (GA)
	`europe-west1`	QPM: 1,800개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 1,800,000개 출력 TPM: 180,000개	1,000,000 (베타), 200,000 (GA)
	`asia-southeast1`	QPM: 1,500개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 1,500,000개 출력 TPM: 150,000개	1,000,000 (베타), 200,000 (GA)
	`global endpoint`	QPM: 1,500개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 1,500,000개 출력 TPM: 150,000개	1,000,000 (베타), 200,000 (GA)
Claude Sonnet 4
	`us-east5`	QPM: 35개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 280,000개 출력 TPM: 20,000개	1,000,000
	`europe-west1`	QPM: 25개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 180,000개 출력 TPM: 20,000개	1,000,000
	`asia-east1`	QPM: 70개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 550,000개 출력 TPM: 50,000개	1,000,000
	`global endpoint`	QPM: 35개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 276,000개 출력 TPM: 24,000개	1,000,000
Claude 3.7 Sonnet
	`us-east5`	QPM: 55개 TPM: 500,000개(캐시되지 않은 입력 및 출력)	200,000
	`europe-west1`	QPM: 40개 TPM: 300,000개(캐시되지 않은 입력 및 출력)	200,000
	`global endpoint`	QPM: 35개 TPM: 300,000개(캐시되지 않은 입력 및 출력)	200,000
Claude 3.5 Sonnet v2
	`us-east5`	QPM: 90개 TPM: 540,000개(입력 및 출력)	200,000
	`europe-west1`	QPM: 55개 TPM: 330,000개(입력 및 출력)	200,000
	`global endpoint`	QPM: 25개 TPM: 140,000개(입력 및 출력)	200,000
Claude Haiku 4.5
	`us-east5`	QPM: 1,500개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 1,500,000개 출력 TPM: 150,000개	200,000
	`europe-west1`	QPM: 1,800개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 1,800,000개 출력 TPM: 180,000개	200,000
	`asia-east1`	QPM: 2,500개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 2,500,000개 출력 TPM: 250,000개	200,000
	`global endpoint`	QPM: 2,500개 입력 TPM: 캐시되지 않음 및 캐시 쓰기 2,500,000개 출력 TPM: 250,000개	200,000
Claude 3.5 Haiku
	`us-east5`	QPM: 80개 TPM: 350,000개(입력 및 출력)	200,000
	`europe-west1`	QPM: 90개 TPM: 400,000개(입력 및 출력)	200,000
Claude 3.5 Sonnet
	`us-east5`	QPM: 80개 TPM: 350,000개(입력 및 출력)	200,000
	`europe-west1`	QPM: 130개 TPM: 600,000개(입력 및 출력)	200,000
	`asia-southeast1`	QPM: 35개 TPM: 150,000개(입력 및 출력)	200,000
Claude 3 Opus
Claude 3 Opus	`us-east5`	QPM: 20개 TPM: 105,000개(입력 및 출력)	200,000
Claude 3 Haiku
	`us-east5`	QPM: 245개 TPM: 600,000개(입력 및 출력)	200,000
	`europe-west1`	QPM: 75개 TPM: 181,000개(입력 및 출력)	200,000
	`asia-southeast1`	QPM: 70개 TPM: 174,000개(입력 및 출력)	200,000

Vertex AI 기반 생성형 AI 할당량을 상향 조정하려면 Google Cloud 콘솔을 사용하여 할당량 상향을 요청하면 됩니다. 할당량에 대해 자세히 알아보려면 Cloud 할당량 개요를 참고하세요.