Vertex AI open models for MaaS

Vertex AI supports a curated list of open models as managed models. These open models can be used with Vertex AI as a model as a service (MaaS) and are offered as a managed API. When you use a managed open model, you continue to send your requests to Vertex AI endpoints. Managed open models are serverless so there's no need to provision or manage infrastructure.

Managed open models can be discovered using Model Garden. You can also deploy models using Model Garden. For more information, see Explore AI models in Model Garden.

Before you can use open models, you need to grant user access to open models.

Open models

The following open models are offered as managed APIs on Vertex AI Model Garden (MaaS):

Model name	Modality	Description	Quickstart
gpt-oss 120B	Language	A 120B model that offers high performance on reasoning tasks.	Model card
gpt-oss 20B	Language	A 20B model optimized for efficiency and deployment on consumer and edge hardware.	Model card
Qwen3-Next-80B Thinking	Language, Code	A model from the Qwen3-Next family of models, specialized for complex problem-solving and deep reasoning.	Model card
Qwen3-Next-80B Instruct	Language, Code	A model from the Qwen3-Next family of models, specialized for for following specific commands.	Model card
Qwen3 Coder	Language, Code	An open-weight model developed for advanced software development tasks.	Model card
Qwen3 235B	Language	An open-weight model with a "hybrid thinking" capability to switch between methodical reasoning and rapid conversation.	Model card
DeepSeek-V3.1	Language	DeepSeek's hybrid model that supports both thinking mode and non-thinking mode.	Model card
DeepSeek R1 (0528)	Language	DeepSeek's latest version of the DeepSeek R1 model.	Model card
Llama 4 Maverick 17B-128E	Language, Vision	The largest and most capable Llama 4 model that has coding, reasoning, and image capabilities. Llama 4 Maverick 17B-128E is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion.	Model card
Llama 4 Scout 17B-16E	Language, Vision	Llama 4 Scout 17B-16E delivers state-of-the-art results for its size class, outperforming previous Llama generations and other open and proprietary models on several benchmarks. Llama 4 Scout 17B-16E is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion.	Model card
Llama 3.3	Language	Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B and to Llama 3.2 90B when used for text-only applications. Moreover, for some applications, Llama 3.3 70B approaches the performance of Llama 3.1 405B.	Model card
Llama 3.2 (Preview)	Language, Vision	A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis as well as image captioning.	Model card
Llama 3.1	Language	A collection of multilingual LLMs optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Llama 3.1 405B is generally available (GA). Llama 3.1 8B and Llama 3.1 70B are in Preview.	Model card

The following open embedding models are offered as managed APIs on Vertex AI Model Garden (MaaS):

Model name	Description	Output dimensions	Max sequence length	Supported text languages	Quickstart
multilingual-e5-small	Part of the E5 family of text embedding models. Small variant contains 12 layers.	Up to 384	512 tokens	Supported languages	Model card
multilingual-e5-large	Part of the E5 family of text embedding models. Large variant contains 24 layers.	Up to 1024	512 tokens	Supported languages	Model card

Open model regulatory compliance

The certifications for Generative AI on Vertex AI continue to apply when open models are used as a managed API using Vertex AI. If you need details about the models themselves, additional information can be found in the respective model card, or you can contact the respective model publisher.

Your data is stored at rest within the selected region or multi-region for open models on Vertex AI, but the regionalization of data processing may vary. For a detailed list of open models' data processing commitments, see Data residency for open models.

Customer prompts and model responses are not shared with third parties when using the Vertex AI API, including open models. Google only processes customer data as instructed by the customer, which is further described in our Cloud Data Processing Addendum.

What's next

Before using open models, Grant user access to open models.
Learn how to Call open model APIs.