Vertex AI supports a curated list of partner and open models as managed models. These models can be used with Vertex AI as a model as a service (MaaS) and are offered as a managed API. When you use a managed model, you continue to send your requests to Vertex AI endpoints. Managed models are serverless so there's no need to provision or manage infrastructure.
Managed models can be discovered using Model Garden. You can also deploy models using Model Garden. For more information, see Explore AI models in Model Garden.
Partner models
The following partner models are offered as managed APIs on Vertex AI Model Garden (MaaS):
Model name | Modality | Description | Quickstart |
---|---|---|---|
Claude Opus 4.1 | Language, Vision | An industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Ideal for powering frontier agent products and features. | Model card |
Claude Opus 4 | Language, Vision | Claude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. | Model card |
Claude Sonnet 4 | Language, Vision | Anthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents. | Model card |
Anthropic's Claude 3.7 Sonnet | Language, Vision | Industry-leading model for coding and powering AI agents—and the first Claude model to offer extended thinking. | Model card |
Anthropic's Claude 3.5 Sonnet v2 | Language, Vision | The upgraded Claude 3.5 Sonnet is a state-of-the-art model for real-world software engineering tasks and agentic capabilities. Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor. | Model card |
Anthropic's Claude 3.5 Haiku | Language, Vision | Claude 3.5 Haiku, the next generation of Anthropic's fastest and most cost-effective model, is optimal for use cases where speed and affordability matter. | Model card |
Anthropic's Claude 3 Haiku | Language | Anthropic's fastest vision and text model for near-instant responses to basic queries, meant for seamless AI experiences mimicking human interactions. | Model card |
Anthropic's Claude 3.5 Sonnet | Language | Claude 3.5 Sonnet outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet. | Model card |
Jamba 1.5 Large (Preview) | Language | AI21 Labs's Jamba 1.5 Large is designed for superior quality responses, high throughput, and competitive pricing compared to other models in its size class. | Model card |
Jamba 1.5 Mini (Preview) | Language | AI21 Labs's Jamba 1.5 Mini is well balanced across quality, throughput, and low cost. | Model card |
Mistral OCR (25.05) | Language, Vision | Mistral OCR (25.05) is an Optical Character Recognition API for document understanding. The model comprehends each element of documents such as media, text, tables, and equations. | Model card |
Mistral Small 3.1 (25.03) | Language | Mistral Small 3.1 (25.03) is the latest version of Mistral's Small model, featuring multimodal capabilities and extended context length. | Model card |
Mistral Large (24.11) | Language | Mistral Large (24.11) is the next version of the Mistral Large (24.07) model now with improved reasoning and function calling capabilities. | Model card |
Codestral (25.01) | Code | A cutting-edge model that's designed for code generation, including fill-in-the-middle and code completion. | Model card |
Open models
The following open models are offered as managed APIs on Vertex AI Model Garden (MaaS):
Model name | Modality | Description | Quickstart |
---|---|---|---|
gpt-oss 120B | Language | A 120B model that offers high performance on reasoning tasks. | Model card |
gpt-oss 20B | Language | A 20B model optimized for efficiency and deployment on consumer and edge hardware. | Model card |
Qwen3-Next-80B Thinking | Language, Code | A model from the Qwen3-Next family of models, specialized for complex problem-solving and deep reasoning. | Model card |
Qwen3-Next-80B Instruct | Language, Code | A model from the Qwen3-Next family of models, specialized for for following specific commands. | Model card |
Qwen3 Coder | Language, Code | An open-weight model developed for advanced software development tasks. | Model card |
Qwen3 235B | Language | An open-weight model with a "hybrid thinking" capability to switch between methodical reasoning and rapid conversation. | Model card |
DeepSeek-V3.1 | Language | DeepSeek's hybrid model that supports both thinking mode and non-thinking mode. | Model card |
DeepSeek R1 (0528) | Language | DeepSeek's latest version of the DeepSeek R1 model. | Model card |
Llama 4 Maverick 17B-128E | Language, Vision | The largest and most capable Llama 4 model that has coding, reasoning, and image capabilities. Llama 4 Maverick 17B-128E is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion. | Model card |
Llama 4 Scout 17B-16E | Language, Vision | Llama 4 Scout 17B-16E delivers state-of-the-art results for its size class, outperforming previous Llama generations and other open and proprietary models on several benchmarks. Llama 4 Scout 17B-16E is a multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion. | Model card |
Llama 3.3 | Language | Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B and to Llama 3.2 90B when used for text-only applications. Moreover, for some applications, Llama 3.3 70B approaches the performance of Llama 3.1 405B. | Model card |
Llama 3.2 (Preview) | Language, Vision | A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis as well as image captioning. | Model card |
Llama 3.1 | Language |
A collection of multilingual LLMs optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Llama 3.1 405B is generally available (GA). Llama 3.1 8B and Llama 3.1 70B are in Preview. |
Model card |