The Gemini 2 models are the latest Google models supported in Vertex AI. This page goes over the following models:
If you're looking for information on our Gemini 2.5 Pro Experimental model, visit our Gemini 2.5 Pro Experimental documentation.
Models
2.0 Flash
Gemini 2.0 Flash is our latest generally available model in the Gemini family. It's our workhorse model for all daily tasks and features enhanced performance and supports real-time Live API.
Gemini 2.0 Flash introduces the following new and enhanced features:
- Multimodal Live API: This new API enables low-latency bidirectional voice and video interactions with Gemini.
- Quality: Enhanced performance across most quality benchmarks.
- Improved agentic capabilities: 2.0 Flash delivers improvements to multimodal understanding, coding, complex instruction following, and function calling. These improvements work together to support better agentic experiences.
- New modalities: 2.0 Flash introduces built-in image generation and controllable text-to-speech capabilities, enabling image editing, localized artwork creation, and expressive storytelling.
Gemini 2.0 Flash features:
- Multimodal input
- Text output (general availability) / multimodal output (private preview)
- Prompt optimizers
- Controlled generation
- Function calling
- Grounding with Google Search
- Code execution
- Count token
Use this model ID to use Gemini 2.0 Flash with the Gen AI SDK: gemini-2.0-flash
Feature availability
The following features are available for Gemini 2.0 Flash:
Feature | Availability level |
---|---|
Text generation | Generally available |
Grounding with Google Search | Generally available |
Gen AI SDK | Generally available |
Multimodal Live API | Public preview |
Bounding box detection | Public preview |
Image generation | Private preview |
Speech generation | Private preview |
- Generally available: This feature is available publicly and supported for use in production-level code.
- Public preview: This feature is available publicly in a reduced capacity. Don't use features that are released as a public preview in production code, because the support level and functionality of that feature can change without warning.
- Private preview: This feature is only available to users listed on an approved allow-list. Don't use features that are released as a private preview in production code, because the support level and functionality of that feature can change without warning.
Pricing
Information on the pricing for Gemini 2.0 Flash is available on our Pricing page.
Quotas and limitations
GA features in Gemini 2.0 Flash uses dynamic shared quota.
Grounding with Google Search in Gemini 2.0 Flash is subject to rate limiting.
2.0 Flash-Lite
Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model.
Gemini 2.0 Flash-Lite includes:
- Multimodal input, text output
- 1M token input context window
- 8k token output context window
2.0 Flash-Lite does not include the following features:
- Multimodal output generation
- Integration with Multimodal Live API
- Thinking mode
- Built-in tool usage
Use this model ID to use Gemini 2.0 Flash-Lite with the Gen AI SDK: gemini-2.0-flash-lite
2.5 Pro Experimental
Gemini 2.5 Pro Experimental is our most advanced reasoning Gemini model, especially for multimodal understanding, coding, and world knowledge. Gemini 2.5 Pro Experimental is available as an experimental model in Vertex AI.
Gemini 2.5 Pro Experimental features:
- Thinking and reasoning
- Multimodal input
- Text output
- Prompt optimizers
- Controlled generation
- Function calling (excluding compositional function calling)
- Grounding with Google Search
- Code execution
- Count token
Use this model ID to use Gemini 2.5 Pro Experimental with the Gen AI SDK: gemini-2.5-pro-exp-03-25
Quotas and limitations
Gemini 2.5 Pro Experimental is rate limited to 10 queries per minute (QPM) during Experimental.
Grounding with Google Search in Gemini 2.5 Pro Experimental is subject to rate limiting.
Google Gen AI SDK
The Gen AI SDK provides a unified interface to Gemini 2.0 through both the Gemini Developer API and the Gemini API on Vertex AI. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.
The SDK is generally available in Python. Support for Go is in Preview, and Java and JavaScript support is coming soon.
You can start using the SDK as shown.
Gen AI SDK for Python
Install
pip install --upgrade google-genai
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True