Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Migrate your application to Gemini 2 with the Gemini API in Vertex AI

This guide shows how to migrate generative AI applications from Gemini 1.x and PaLM models to Gemini 2 models.

Why migrate to Gemini 2?

Gemini 2 delivers significant performance improvements over Gemini 1.x and PaLM models, along with new capabilities. Additionally, each model version has its own version support and availability timeline.

Upgrading most generative AI applications to Gemini 2 shouldn't require significant reengineering of prompts or code. But some applications require prompt changes, and these changes are difficult to predict without running a prompt through Gemini 2 first. Therefore, Gemini 2 testing is recommended before migration.

Significant code changes are only needed for certain breaking changes, or to use new Gemini 2 capabilities.

Which Gemini 2 model should I migrate to?

As you choose a Gemini 2 model to migrate to, you'll want to consider the features that your application requires, as well as the cost of those features.

For an overview of Gemini 2 model features, see Gemini 2. For an overview of all Google models, see Google models.

For a comparison of Gemini 1.x and Gemini 2 models, see the following table.

Feature	Gemini 1.5 Pro	Gemini 1.5 Flash	Gemini 2.0 Flash	Gemini 2.0 Flash-Lite	Gemini 2.5 Pro	Gemini 2.5 Flash
Input modalities	text, documents, image, video, audio	text, documents, image, video, audio	text, documents, image, video, audio	text, documents, image, video, audio	text, documents, image, video, audio	text, documents, image, video, audio
Output modalities	text	text	text	text	text	text
Context window, total token limit	2,097,152	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576
Output context length	8,192	8,192	8,192	8,192	64,192	64,192
Grounding with Search	Yes	Yes	Yes	No	Yes	Yes
Function calling	Yes	Yes	Yes	Yes	Yes	Yes
Code execution	No	No	Yes	No	Yes	Yes
Context caching	Yes	Yes	Yes	No	Yes	Yes
Batch prediction	Yes	Yes	Yes	Yes	Yes	Yes
Live API	No	No	No	No	No	No
Latency	Most capable in 1.5 family	Fastest in 1.5 family	Fast + good cost efficiency	Fast + most cost efficient	Slower than Flash, but good cost efficiency	Fast + most cost efficient
Fine-tuning	Yes	Yes	Yes	Yes	Yes	Yes
Recommended SDK	Vertex AI SDK	Vertex AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK
Pricing units	Character	Character	Token	Token	Token	Token

Before you begin

For a seamless Gemini 2 migration, we recommend that you address the following concerns before you begin the migration process.

Model retirement awareness

Note the model version support and availability timelines for older Gemini models, and make sure your migration is completed before the model you're using is retired.

InfoSec, governance, and regulatory approvals

Proactively request the approvals you need for Gemini 2 from your information security (InfoSec), risk, and compliance stakeholders. Make sure that you cover domain-specific risk and compliance constraints, especially in heavily regulated industries such as healthcare and financial services. Note that Gemini security controls differ among Gemini 2 models.

Location availability

See the Generative AI on Google Cloud models and partner model availability documentation, and make sure your chosen Gemini 2 model is available in the regions where you need it, or consider switching to the global endpoint.

Modality and tokenization-based pricing differences

Check Gemini 2 pricing for all the modalities (text, code, images, speech) in your application. For more information, see generative AI pricing page. Note that Gemini 2 text input and output is priced per token, while Gemini 1 text input and output is priced per character.

Provisioned Throughput

If needed, purchase additional Provisioned Throughput for Gemini 2 or change existing Provisioned Throughput orders.

Supervised fine-tuning

If your Gemini application uses supervised fine-tuning, submit a new tuning job with Gemini 2. We recommend that you start with the default tuning hyperparameters instead of reusing the hyperparameter values that you used with previous Gemini versions. The tuning service has been optimized for Gemini 2. Therefore, reusing previous hyperparameter values might not yield the best results.

Regression testing

There are three main types of regression tests involved when upgrading to Gemini 2 models:

Code regression tests: Regression testing from a software engineering and DevOps perspective. This type of regression test is always required.
Model performance regression tests: Regression testing from a data science or machine learning perspective. This means ensuring that the new Gemini 2 model provides outputs that are at least as high-quality as outputs from the current production model. Model performance regression tests are just model evaluations done as part of a change to a system or to the underlying model. Model performance regression testing further breaks down into:
- Offline model performance testing: Assessing the quality of model outputs in a dedicated experimentation environment based on various model output quality metrics.
- Online model performance testing: Assessing the quality of model outputs in a live online deployment based on implicit or explicit user feedback.
Load testing: Assessing how the application handles high volumes of inference requests. This type of regression test is required for applications that require Provisioned Throughput.

Document model evaluation and testing requirements

Prepare to repeat any relevant evaluations from when you originally built your application, along with any relevant evaluations you have done since then.
If you feel your existing evaluations don't appropriately cover or measure the breadth of tasks that your application performs, you should design and prepare additional evaluations.
If your application involves RAG, tool use, complex agentic workflows, or prompt chains, make sure that your existing evaluation data allows for assessing each component independently. If not, gather input-output examples for each component.
If your application is especially high-impact, or if it's part of a larger user-facing real-time system, you should include online evaluation.

Code upgrades and testing

Consider upgrading to the Google Gen AI SDK

If your Gemini 1.x application uses the Vertex AI SDK, consider upgrading to the Gen AI SDK. New Gemini 2 capabilities are only available in the Gen AI SDK. However, there is no need to switch to the Gen AI SDK if your application only requires capabilities that are available in the Vertex AI SDK. If you're new to the Gen AI SDK, see the Getting started with Google Generative AI using the Gen AI SDK notebook.

Gen AI SDK

We recommend that you migrate to the Gen AI SDK when upgrading to Gemini 2.0.

If you choose to use the Gen AI SDK, the setup process is different from the Vertex AI SDK.

For more information, visit Google Gen AI SDK.

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="How does AI work?",
)
print(response.text)
# Example response:
# Okay, let's break down how AI works. It's a broad field, so I'll focus on the ...
#
# Here's a simplified overview:
# ...

Replace GOOGLE_CLOUD_PROJECT with your Google Cloud project ID, and replace GOOGLE_CLOUD_LOCATION with the location of your Google Cloud project (for example, us-central1).

Vertex AI SDK

If you reuse the Vertex AI SDK, the setup process is the same for the 1.0, 1.5, and 2.0 models. For more information, see Introduction to the Vertex AI SDK for Python.

Install the SDK:

pip install --upgrade --quiet google-cloud-aiplatform

The following is a short code sample that uses the Vertex AI SDK for Python:

import vertexai
from vertexai.generative_models import GenerativeModel

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-2.0-flash-001")

response = model.generate_content(
    "What's a good name for a flower shop that specializes in selling bouquets of dried flowers?"
)

print(response.text)
# Example response:
# **Emphasizing the Dried Aspect:**
# * Everlasting Blooms
# * Dried & Delightful
# * The Petal Preserve
# ...

Replace PROJECT_ID with your Google Cloud project ID, and replace LOCATION with the location of your Google Cloud project (for example, us-central1). Then, change the model ID from gemini-1.5-flash-002 to gemini-2.0-flash.

Change your Gemini calls

Change your prediction code to use Gemini 2. At a minimum, this means changing the specific model endpoint name to a Gemini 2 model where you load your model.

The exact code change will differ depending on how you originally implemented your application, and especially whether you used the Gen AI SDK or the Vertex AI SDK.

After you make your code changes, perform code regression testing and other software tests on your code to make sure that it runs. This test is only meant to assess whether the code functions correctly. It's not meant to assess the quality of model responses.

Address breaking code changes

Dynamic retrieval: Switch to using Grounding with Google Search. This feature requires using the Gen AI SDK; it's not supported by the Vertex AI SDK.
Content filters: Note the default content filter settings, and change your code if it relies on a default that has changed.
Top-K token sampling parameter: Models after gemini-1.0-pro-vision don't support changing the Top-K parameter.

Focus only on code changes in this step. You may need to make other changes, but wait until you start your evaluation, and then consider the following adjustment based on evaluation results:

If you're switching from dynamic retrieval, you might need to experiment with system instructions to control when Google Search is used (for example, "Only generate queries for the Google Search tool if the user asks about sports. Don't generate queries for any other topic."), but wait until you evaluate before making prompt changes.
If you used the Top-K parameter, adjust other token sampling parameters, such as Top-P, to achieve similar results.

Offline evaluation

Repeat the evaluation that you did when you originally developed and launched your application, any further offline evaluation you did after launching, and any additional evaluation you identified in step 1. If you then feel that your evaluation doesn't fully capture the breadth and depth of your application, do further evaluation.

If you don't have an automated way to run your offline evaluations, consider using the Gen AI evaluation service.

If your application uses fine-tuning, perform offline evaluation before retuning your model with Gemini 2. Gemini 2's improved output quality may mean that your application no longer requires a fine-tuned model.

Assess evaluation results and tune the Gemini 2 prompts and hyperparameters

If your offline evaluation shows a drop in performance with Gemini 2, iterate on your application as follows until Gemini performance matches the older model:

Iteratively engineer your prompts to improve performance ("Hill Climbing"). If you are new to hill climbing, see the Vertex Gemini hill climbing online training. The Vertex AI prompt optimizer (example notebook) can help as well.
If your application already relies on fine-tuning, try fine-tuning Gemini 2.
If your application is impacted by Dynamic Retrieval and Top-K breaking changes, experiment with changing your prompt and token sampling parameters.

Load testing

If your application requires a certain minimum throughput, perform load testing to make sure the Gemini 2 version of your application meets your throughput requirements.

Load testing should happen before online evaluation, because online evaluation requires exposing Gemini 2 to production traffic. Use your existing load testing instrumentation to perform this step.

If your application already meets throughput requirements, consider using Provisioned Throughput. You'll need additional short-term Provisioned Throughput to cover load testing while your existing Provisioned Throughput order continues to serve production traffic.

Online evaluation

Only proceed to online evaluation if your offline evaluation shows adequate Gemini output quality and your application requires online evaluation.

Online evaluation is a special case of online testing. Try to use your organization's existing tools and procedures for online evaluation. For example:

If your organization regularly conducts A/B tests, perform an A/B test that evaluates the current implementation of your application compared to the Gemini 2 version.
If your organization regularly conducts canary deployments, be sure to do so with Gemini 2 and measure differences in user behavior.

Online evaluation can also be done by building new feedback and measurement capabilities into your application. Different feedback and measurement capabilities are appropriate for different applications. For example:

Adding thumbs-up and thumbs-down buttons next to model outputs and comparing thumbs-up versus thumbs-down rates between an older model and Gemini 2.
Presenting users with the older model and Gemini 2 outputs side-by-side and asking for users to pick their favorite.
Tracking how often users override or manually adjust older model versus Gemini 2 outputs.

These kinds of feedback mechanisms often require running a Gemini 2 version of your application in parallel to your existing version. This parallel deployment is sometimes called "shadow mode" or "blue-green deployment".

If online evaluation results differ significantly from offline evaluation results, your offline evaluation is not capturing key aspects of the live environment or user experience. Use the online evaluation findings to devise a new offline evaluation to cover the gap the online evaluation exposed, and then return to step 3.

If you use Provisioned Throughput, you may need to purchase additional short-term Provisioned Throughput to continue to meet your throughput requirements for users subject to online evaluation.

Production deployment

Once your evaluation shows that Gemini 2 meets or exceeds performance of an older model, turn down the existing version of your application in favor of the Gemini 2 version. Follow your organization's existing procedures for production rollout.

If you're using Provisioned Throughput, change your Provisioned Throughput order to your chosen Gemini 2 model. If you're rolling out your application incrementally, use short-term Provisioned Throughput to meet throughput requirements for two different Gemini models.

Improving model performance

As you complete your migration, use the following tips to maximize Gemini 2 model performance:

Inspect your system instructions, prompts, and few-shot learning examples for any inconsistencies, contradictions, or irrelevant instructions and examples.
Test a more powerful model. For example, if you evaluated Gemini 2.0 Flash-Lite, try Gemini 2.0 Flash.
Examine any automated evaluation results to make sure they match human judgment, especially results that use a judge model. Make sure your judge model instructions don't contain inconsistencies or ambiguities.
One way to improve judge model instructions is to test the instructions with multiple humans in isolation and see if their judgments are consistent. If humans interpret the instructions differently and render different judgments, your judge model instructions are ambiguous.
Fine-tune the Gemini 2 model.
Examine evaluation outputs to look for patterns that show specific kinds of failures. Grouping together failures into different models, kinds, or categories gives you more targeted evaluation data, which makes it easier to adjust prompts to address these errors.
Make sure you are independently evaluating different generative AI components.
Experiment with adjusting token sampling parameters.

Getting help

If you need help, Google Cloud offers support packages to meet your needs, such as 24/7 coverage, phone support, and access to a technical support manager. For more information, see Google Cloud Support.

What's next

Read the list of frequently asked questions.
Migrate from the PaLM API to the Gemini API in Vertex AI.