This document explains how to use partner models on Vertex AI as a managed service and covers the following topics: Vertex AI supports a curated list of models from Google partners, offered as a model as a service (MaaS). When you use a partner model, you send requests to a Vertex AI API endpoint, and the model runs without you needing to provision or manage any infrastructure. You can explore and deploy partner models in Model Garden. For more information, see Explore AI models in Model Garden. This document focuses on partner models available as a MaaS on Vertex AI. For details about a specific model, see its model card in Model Garden. Anthropic's Claude and Mistral models are examples of third-party managed models that are available to use on Vertex AI. The following partner models are offered as managed APIs on Vertex AI Model Garden (MaaS): A collection of multilingual LLMs optimized for
multilingual dialogue use cases and outperform many of the available
open source and closed chat models on common industry benchmarks. Llama 3.1 405B is generally available (GA) and priced as per dollar-per-1M-tokens. See pricing. Llama 3.1 8B and Llama 3.1 70B are in Preview at no cost. For some partner models, you can use provisioned throughput to reserve processing capacity for a fixed fee. You decide on the throughput capacity and in which regions to reserve that capacity. Because provisioned throughput requests are prioritized over standard pay-as-you-go requests, this service provides increased availability. When the system is overloaded, your requests are still processed as long as the throughput is within your reserved capacity. For more information or to subscribe to the service, Contact sales. You can send requests to partner models using either regional or global endpoints. There is no price difference between regional and global endpoints. However, the global endpoint quotas and supported model capabilities can differ from the regional endpoints. For more information, view the related third-party model page. To use the global endpoint, set the region to For example, the request URL for a curl command uses the following format:
For the Vertex AI SDK, a regional endpoint is the default. To use the global endpoint, set the region to The global endpoint is available for the following models:
Partner models
Model name
Modality
Description
Quickstart
Claude Opus 4.1
Language, Vision
An industry leader for coding. It delivers sustained performance on
long-running tasks that require focused effort and thousands of steps, significantly
expanding what AI agents can solve. Ideal for powering frontier agent products and
features.
Model card
Claude Opus 4
Language, Vision
Claude Opus 4 delivers
sustained performance on long-running tasks that require focused
effort and thousands of steps, significantly expanding what AI agents
can solve.
Model card
Claude Sonnet 4
Language, Vision
Anthropic's mid-size model with superior
intelligence for high-volume uses, such as coding, in-depth research,
and agents.
Model card
Anthropic's Claude 3.7 Sonnet
Language, Vision
Industry-leading model for coding and powering AI
agents—and the first Claude model to offer extended thinking.
Model card
Anthropic's Claude 3.5 Sonnet v2
Language, Vision
The upgraded Claude 3.5 Sonnet is
a state-of-the-art model for real-world software engineering tasks and
agentic capabilities. Claude 3.5 Sonnet delivers these
advancements at the same price and speed as its predecessor.
Model card
Anthropic's Claude 3.5 Haiku
Language, Vision
Claude 3.5 Haiku, the next
generation of Anthropic's fastest and most cost-effective model, is
optimal for use cases where speed and affordability matter.
Model card
Anthropic's Claude 3 Opus
Language
A powerful AI model, with top-level performance on
highly complex tasks. It can navigate open-ended prompts and
sight-unseen scenarios with remarkable fluency and human-like
understanding.
Model card
Anthropic's Claude 3 Haiku
Language
Anthropic's fastest vision and text model for
near-instant responses to basic queries, meant for seamless AI
experiences mimicking human interactions.
Model card
Anthropic's Claude 3.5 Sonnet
Language
Claude 3.5 Sonnet outperforms
Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with
the speed and cost of Anthropic's mid-tier model,
Claude 3 Sonnet.
Model card
DeepSeek R1 (0528) (Preview)
Language
DeepSeek's latest version of the DeepSeek
R1 model.
Model card
Jamba 1.5 Large (Preview)
Language
AI21 Labs's Jamba 1.5 Large is
designed for superior quality responses, high throughput, and
competitive pricing compared to other models in its size class.
Model card
Jamba 1.5 Mini (Preview)
Language
AI21 Labs's Jamba 1.5 Mini is well
balanced across quality, throughput, and low cost.
Model card
Llama 4 Maverick 17B-128E (GA)
Language, Vision
The largest and most capable Llama 4 model that has
coding, reasoning, and image capabilities. Llama 4 Maverick 17B-128E
is a multimodal model that uses the Mixture-of-Experts (MoE)
architecture and early fusion.
Model card
Llama 4 Scout 17B-16E (GA)
Language, Vision
Llama 4 Scout 17B-16E delivers state-of-the-art results for its
size class, outperforming previous Llama generations and other open
and proprietary models on several benchmarks. Llama 4 Scout 17B-16E
is a multimodal model that uses the Mixture-of-Experts (MoE)
architecture and early fusion.
Model card
Llama 3.3 (GA)
Language
Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced
performance relative to Llama 3.1 70B and to Llama 3.2 90B when used for
text-only applications. Moreover, for some applications, Llama 3.3 70B
approaches the performance of Llama 3.1 405B.
Model card
Llama 3.2 (Preview)
Language, Vision
A medium-sized 90B multimodal model that can
support image reasoning, such as chart and graph analysis as well as
image captioning.
Model card
Llama 3.1 (GA and Preview)
Language
Model card
Mistral OCR (25.05)
Language, Vision
Mistral OCR (25.05) is an Optical Character
Recognition API for document understanding. The model comprehends each
element of documents such as media, text, tables, and equations.
Model card
Mistral Small 3.1 (25.03)
Language
Mistral Small 3.1 (25.03) is the latest version
of Mistral's Small model, featuring multimodal capabilities and
extended context length.
Model card
Mistral Large (24.11)
Language
Mistral Large (24.11) is the next version of
the Mistral Large (24.07) model now with improved reasoning and
function calling capabilities.
Model card
Codestral (25.01)
Code
A cutting-edge model that's designed for
code generation, including fill-in-the-middle and code
completion.
Model card
Vertex AI partner model pricing with capacity assurance
Regional and global endpoints
Endpoint Type
Description
Pros
Cons
Regional
Requests are served from the specific region you designate.
Helps meet data residency requirements.
Availability is tied to a single region.
Global
Google Cloud can process and serve requests from any region supported by the model.
Improves overall availability and can help reduce errors.
Might result in higher latency in some cases. Not all models or features (like provisioned throughput) are supported.
Specify the global endpoint
global
.https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/PUBLISHER_NAME/models/MODEL_NAME
GLOBAL
.Supported models
Restrict global API endpoint usage
To enforce the use of regional endpoints, use the
constraints/gcp.restrictEndpointUsage
organization policy constraint to block
requests to the global API endpoint. For more information, see
Restricting endpoint usage.
Grant user access to partner models
To enable partner models and make prompt requests, a Google Cloud administrator must set the required permissions and verify that the organization policy allows the use of required APIs.
Set required permissions to use partner models
The following roles and permissions are required to use partner models:
- You must have the Consumer Procurement Entitlement Manager Identity and Access Management (IAM) role. Anyone who's been granted this role can enable partner models in Model Garden.
- You must have the
aiplatform.endpoints.predict
permission. This permission is included in the Vertex AI User IAM role. For more information, see Vertex AI User and Access control.
Console
To grant the required IAM roles to a user, go to the IAM page.
In the Principal column, find the user principal for which you want to enable access to partner models, and then click Edit principal in that row.
In the Edit access pane, click
Add another role.In Select a role, select Consumer Procurement Entitlement Manager.
In the Edit access pane, click
Add another role.In Select a role, select Vertex AI User.
Click Save.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
Grant the Consumer Procurement Entitlement Manager role that's required to enable partner models in Model Garden.
gcloud projects add-iam-policy-binding PROJECT_ID \ --member=PRINCIPAL --role=roles/consumerprocurement.entitlementManager
Grant the Vertex AI User role that includes the
aiplatform.endpoints.predict
permission which is required to make prompt requests:gcloud projects add-iam-policy-binding PROJECT_ID \ --member=PRINCIPAL --role=roles/aiplatform.user
Replace
PRINCIPAL
with the identifier for the principal. The identifier takes the formuser|group|serviceAccount:email
ordomain:domain
—for example,user:cloudysanfrancisco@gmail.com
,group:admins@example.com
,serviceAccount:test123@example.domain.com
, ordomain:example.domain.com
.The output is a list of policy bindings that includes the following:
- members: - user:PRINCIPAL role: roles/roles/consumerprocurement.entitlementManager
For more information, see Grant a single role and
gcloud projects add-iam-policy-binding
.
Set the organization policy for partner model access
To enable partner models, your organization policy must allow the following API: Cloud Commerce Consumer Procurement API - cloudcommerceconsumerprocurement.googleapis.com
If your organization sets an organization policy to restrict service usage, an organization administrator must verify that cloudcommerceconsumerprocurement.googleapis.com
is allowed by setting the organization policy.
Also, if you have an organization policy that restricts model usage in Model Garden, the policy must allow access to partner models. For more information, see Control model access.
Partner model regulatory compliance
The certifications for Generative AI on Vertex AI continue to apply when you use partner models as a managed API using Vertex AI. If you need details about the models themselves, you can find more information in the respective Model Card, or you can contact the model publisher.
Your data is stored at rest within the selected region or multi-region for partner models on Vertex AI, but the regionalization of data processing might vary. For a detailed list of partner models' data processing commitments, see Data residency for partner models.
When you use the Vertex AI API, including partner models, your prompts and the model responses are not shared with third-parties. Google Cloud processes Customer Data only as instructed by the Customer. For more information, see our Cloud Data Processing Addendum.