Vertex AI partner models for MaaS

Vertex AI supports a curated list of models developed by Google partners. Partner models can be used with Vertex AI as a model as a service (MaaS) and are offered as a managed API. When you use a partner model, you continue to send your requests to Vertex AI endpoints. Partner models are serverless so there's no need to provision or manage infrastructure.

Partner models can be discovered using Model Garden. You can also deploy models using Model Garden. For more information, see Explore AI models in Model Garden. While information about each available partner model can be found on its model card in Model Garden, only third-party models that perform as a MaaS with Vertex AI are documented in this guide.

Anthropic's Claude and Mistral models are examples of third-party managed models that are available to use on Vertex AI.

Partner models

The following partner models are offered as managed APIs on Vertex AI Model Garden (MaaS):

Model name	Modality	Description	Quickstart
Claude Sonnet 4.5	Language, Vision	Anthropic's mid-sized model for powering real-world agents, with capabilities in coding, computer use, cybersecurity, and working with office files like spreadsheets.	Model card
Claude Opus 4.1	Language, Vision	An industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Ideal for powering frontier agent products and features.	Model card
Claude Haiku 4.5	Language, Vision	Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases, and stands out as one of the best coding models in the world–with the right speed and cost to power free products and high-volume user experiences.	Model card
Claude Opus 4	Language, Vision	Claude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.	Model card
Claude Sonnet 4	Language, Vision	Anthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents.	Model card
Anthropic's Claude 3.7 Sonnet	Language, Vision	Industry-leading model for coding and powering AI agents—and the first Claude model to offer extended thinking.	Model card
Anthropic's Claude 3.5 Sonnet v2	Language, Vision	The upgraded Claude 3.5 Sonnet is a state-of-the-art model for real-world software engineering tasks and agentic capabilities. Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.	Model card
Anthropic's Claude 3.5 Haiku	Language, Vision	Claude 3.5 Haiku, the next generation of Anthropic's fastest and most cost-effective model, is optimal for use cases where speed and affordability matter.	Model card
Anthropic's Claude 3 Haiku	Language	Anthropic's fastest vision and text model for near-instant responses to basic queries, meant for seamless AI experiences mimicking human interactions.	Model card
Anthropic's Claude 3.5 Sonnet	Language	Claude 3.5 Sonnet outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet.	Model card
Jamba 1.5 Large (Preview)	Language	AI21 Labs's Jamba 1.5 Large is designed for superior quality responses, high throughput, and competitive pricing compared to other models in its size class.	Model card
Jamba 1.5 Mini (Preview)	Language	AI21 Labs's Jamba 1.5 Mini is well balanced across quality, throughput, and low cost.	Model card
Mistral Medium 3	Language	Mistral Medium 3 is a versatile model designed for a wide range of tasks, including programming, mathematical reasoning, understanding long documents, summarization, and dialogue.	Model card
Mistral OCR (25.05)	Language, Vision	Mistral OCR (25.05) is an Optical Character Recognition API for document understanding. The model comprehends each element of documents such as media, text, tables, and equations.	Model card
Mistral Small 3.1 (25.03)	Language	Mistral Small 3.1 (25.03) is the latest version of Mistral's Small model, featuring multimodal capabilities and extended context length.	Model card
Mistral Large (24.11)	Language	Mistral Large (24.11) is the next version of the Mistral Large (24.07) model now with improved reasoning and function calling capabilities.	Model card
Codestral 2	Language, Code	Codestral 2 is Mistral's code generation specialized model built specifically for high-precision fill-in-the-middle (FIM) completion that helps developers write and interact with code through a shared instruction and completion API endpoint.	Model card
Codestral (25.01)	Code	A cutting-edge model that's designed for code generation, including fill-in-the-middle and code completion.	Model card

Vertex AI partner model pricing with capacity assurance

Google offers provisioned throughput for some partner models that reserves throughput capacity for your models for a fixed fee. You decide on the throughput capacity and in which regions to reserve that capacity. Because provisioned throughput requests are prioritized over the standard pay-as-you-go requests, provisioned throughput provides increased availability. When the system is overloaded, your requests can still be completed as long as the throughput remains under your reserved throughput capacity. For more information or to subscribe to the service, Contact sales.

Regional and global endpoints

For regional endpoints, requests are served from your specified region. In cases where you have data residency requirements or if a model doesn't support the global endpoint, use the regional endpoints.

When you use the global endpoint, Google can process and serve your requests from any region that is supported by the model that you are using, which might result in higher latency in some cases. The global endpoint helps improve overall availability and helps reduce errors.

There is no price difference with the regional endpoints when you use the global endpoint. However, the global endpoint quotas and supported model capabilities can differ from the regional endpoints. For more information, view the related third-party model page.

Specify the global endpoint

To use the global endpoint, set the region to global.

For example, the request URL for a curl command uses the following format: https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/PUBLISHER_NAME/models/MODEL_NAME

For the Vertex AI SDK, a regional endpoint is the default. Set the region to GLOBAL to use the global endpoint.

Supported models

The global endpoint is available for the following models:

Restrict global API endpoint usage

To help enforce the use of regional endpoints, use the constraints/gcp.restrictEndpointUsage organization policy constraint to block requests to the global API endpoint. For more information, see Restricting endpoint usage.

Grant user access to partner models

For you to enable partner models and make a prompt request, a Google Cloud administrator must set the required permissions and verify the organization policy allows the use of required APIs.

Set required permissions to use partner models

The following roles and permissions are required to use partner models:

You must have the Consumer Procurement Entitlement Manager Identity and Access Management (IAM) role. Anyone who's been granted this role can enable partner models in Model Garden.
You must have the aiplatform.endpoints.predict permission. This permission is included in the Vertex AI User IAM role. For more information, see Vertex AI User and Access control.

Console

To grant the Consumer Procurement Entitlement Manager IAM roles to a user, go to the IAM page.

Go to IAM
In the Principal column, find the user principal for which you want to enable access to partner models, and then click Edit principal in that row.
In the Edit access pane, click Add another role.
In Select a role, select Consumer Procurement Entitlement Manager.
In the Edit access pane, click Add another role.
In Select a role, select Vertex AI User.
Click Save.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

Grant the Consumer Procurement Entitlement Manager role that's required to enable partner models in Model Garden

gcloud projects add-iam-policy-binding  PROJECT_ID \
--member=PRINCIPAL --role=roles/consumerprocurement.entitlementManager

Grant the Vertex AI User role that includes the aiplatform.endpoints.predict permission which is required to make prompt requests:
```
gcloud projects add-iam-policy-binding  PROJECT_ID \
--member=PRINCIPAL --role=roles/aiplatform.user
```
Replace PRINCIPAL with the identifier for the principal. The identifier takes the form user|group|serviceAccount:email or domain:domain—for example, user:cloudysanfrancisco@gmail.com, group:admins@example.com, serviceAccount:test123@example.domain.com, or domain:example.domain.com.

The output is a list of policy bindings that includes the following:
```
-   members:
  -   user:PRINCIPAL
  role: roles/roles/consumerprocurement.entitlementManager
```
For more information, see Grant a single role and gcloud projects add-iam-policy-binding.

Set the organization policy for partner model access

To enable partner models, your organization policy must allow the following API: Cloud Commerce Consumer Procurement API - cloudcommerceconsumerprocurement.googleapis.com

If your organization sets an organization policy to restrict service usage, then an organization administrator must verify that cloudcommerceconsumerprocurement.googleapis.com is allowed by setting the organization policy.

Also, if you have an organization policy that restricts model usage in Model Garden, the policy must allow access to partner models. For more information, see Control model access.

Partner model regulatory compliance

The certifications for Generative AI on Vertex AI continue to apply when partner models are used as a managed API using Vertex AI. If you need details about the models themselves, additional information can be found in the respective Model Card, or you can contact the respective model publisher.

Your data is stored at rest within the selected region or multi-region for partner models on Vertex AI, but the regionalization of data processing may vary. For a detailed list of partner models' data processing commitments, see Data residency for partner models.

Customer prompts and model responses are not shared with third-parties when using the Vertex AI API, including partner models. Google only processes Customer Data as instructed by the Customer, which is further described in our Cloud Data Processing Addendum.