Vertex AI open models for MaaS

Vertex AI supports a curated list of open models as managed models. These open models can be used with Vertex AI as a model as a service (MaaS) and are offered as a managed API. When you use a managed open model, you continue to send your requests to Vertex AI endpoints. Managed open models are serverless so there's no need to provision or manage infrastructure.

Managed open models can be discovered using Model Garden. You can also deploy models using Model Garden. For more information, see Explore AI models in Model Garden.

Open models

The following open models are offered as managed APIs on Vertex AI Model Garden (MaaS):

Model name Modality Description Quickstart
gpt-oss 120B Language A 120B model that offers high performance on reasoning tasks. Model card
gpt-oss 20B Language A 20B model optimized for efficiency and deployment on consumer and edge hardware. Model card
Qwen3-Next-80B Thinking Language, Code A model from the Qwen3-Next family of models, specialized for complex problem-solving and deep reasoning. Model card
Qwen3-Next-80B Instruct Language, Code A model from the Qwen3-Next family of models, specialized for for following specific commands. Model card
Qwen3 Coder Language, Code An open-weight model developed for advanced software development tasks. Model card
Qwen3 235B Language An open-weight model with a "hybrid thinking" capability to switch between methodical reasoning and rapid conversation. Model card
DeepSeek-V3.1 Language DeepSeek's hybrid model that supports both thinking mode and non-thinking mode. Model card
DeepSeek R1 (0528) Language DeepSeek's latest version of the DeepSeek R1 model. Model card
Llama 4 Maverick 17B-128E Language, Vision The largest and most capable Llama 4 model that has coding, reasoning, and image capabilities. Llama 4 Maverick 17B-128E is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion. Model card
Llama 4 Scout 17B-16E Language, Vision Llama 4 Scout 17B-16E delivers state-of-the-art results for its size class, outperforming previous Llama generations and other open and proprietary models on several benchmarks. Llama 4 Scout 17B-16E is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion. Model card
Llama 3.3 Language Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B and to Llama 3.2 90B when used for text-only applications. Moreover, for some applications, Llama 3.3 70B approaches the performance of Llama 3.1 405B. Model card
Llama 3.2 (Preview) Language, Vision A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis as well as image captioning. Model card
Llama 3.1 Language

A collection of multilingual LLMs optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Llama 3.1 405B is generally available (GA).

Llama 3.1 8B and Llama 3.1 70B are in Preview.

Model card

Regional and global endpoints

For regional endpoints, requests are served from your specified region. In cases where you have data residency requirements or if a model doesn't support the global endpoint, use the regional endpoints.

When you use the global endpoint, Google can process and serve your requests from any region that is supported by the model that you are using. This might result in higher latency in some cases. The global endpoint helps improve overall availability and helps reduce errors.

There is no price difference with the regional endpoints when you use the global endpoint. However, the global endpoint quotas and supported model capabilities can differ from the regional endpoints. For more information, view the related third-party model page.

Specify the global endpoint

To use the global endpoint, set the region to global.

For example, the request URL for a curl command uses the following format: https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/PUBLISHER_NAME/models/MODEL_NAME

For the Vertex AI SDK, a regional endpoint is the default. Set the region to GLOBAL to use the global endpoint.

Restrict global API endpoint usage

To help enforce the use of regional endpoints, use the constraints/gcp.restrictEndpointUsage organization policy constraint to block requests to the global API endpoint. For more information, see Restricting endpoint usage.

Grant user access to open models

For you to enable open models and make a prompt request, a Google Cloud administrator must set the required permissions and verify the organization policy allows the use of required APIs.

Set required permissions to use open models

The following roles and permissions are required to use open models:

  • You must have the Consumer Procurement Entitlement Manager Identity and Access Management (IAM) role. Anyone who's been granted this role can enable open models in Model Garden.

  • You must have the aiplatform.endpoints.predict permission. This permission is included in the Vertex AI User IAM role. For more information, see Vertex AI User and Access control.

Console

  1. To grant the Consumer Procurement Entitlement Manager IAM roles to a user, go to the IAM page.

    Go to IAM

  2. In the Principal column, find the user principal for which you want to enable access to open models, and then click Edit principal in that row.

  3. In the Edit access pane, click Add another role.

  4. In Select a role, select Consumer Procurement Entitlement Manager.

  5. In the Edit access pane, click Add another role.

  6. In Select a role, select Vertex AI User.

  7. Click Save.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  2. Grant the Consumer Procurement Entitlement Manager role that's required to enable open models in Model Garden

    gcloud projects add-iam-policy-binding  PROJECT_ID \
    --member=PRINCIPAL --role=roles/consumerprocurement.entitlementManager
    
  3. Grant the Vertex AI User role that includes the aiplatform.endpoints.predict permission which is required to make prompt requests:

    gcloud projects add-iam-policy-binding  PROJECT_ID \
    --member=PRINCIPAL --role=roles/aiplatform.user
    

    Replace PRINCIPAL with the identifier for the principal. The identifier takes the form user|group|serviceAccount:email or domain:domain—for example, user:cloudysanfrancisco@gmail.com, group:admins@example.com, serviceAccount:test123@example.domain.com, or domain:example.domain.com.

    The output is a list of policy bindings that includes the following:

    -   members:
      -   user:PRINCIPAL
      role: roles/roles/consumerprocurement.entitlementManager
    

    For more information, see Grant a single role and gcloud projects add-iam-policy-binding.

Set the organization policy for open model access

To enable open models, your organization policy must allow the following API: Cloud Commerce Consumer Procurement API - cloudcommerceconsumerprocurement.googleapis.com

If your organization sets an organization policy to restrict service usage, then an organization administrator must verify that cloudcommerceconsumerprocurement.googleapis.com is allowed by setting the organization policy.

Also, if you have an organization policy that restricts model usage in Model Garden, the policy must allow access to open models. For more information, see Control model access.

Open model regulatory compliance

The certifications for Generative AI on Vertex AI continue to apply when open models are used as a managed API using Vertex AI. If you need details about the models themselves, additional information can be found in the respective model card, or you can contact the respective model publisher.

Your data is stored at rest within the selected region or multi-region for open models on Vertex AI, but the regionalization of data processing may vary. For a detailed list of open models' data processing commitments, see Data residency for open models.

Customer prompts and model responses are not shared with third parties when using the Vertex AI API, including open models. Google only processes customer data as instructed by the customer, which is further described in our Cloud Data Processing Addendum.