Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Migrate your application to Gemini 2 with the Gemini API in Vertex AI
Stay organized with collections
Save and categorize content based on your preferences.
This guide shows how to migrate generative AI applications from Gemini 1.x and PaLM models to Gemini 2 models.
Why migrate to Gemini 2?
Gemini 2 delivers significant performance improvements over Gemini 1.x and PaLM models, along with new capabilities. Additionally, each model version has its own version support and availability timeline.
Upgrading most generative AI applications to Gemini 2 shouldn't require significant reengineering of prompts or code. But some applications require prompt changes, and these changes are difficult to predict without running a prompt through Gemini 2 first. Therefore, Gemini 2 testing is recommended before migration.
Significant code changes are only needed for certain breaking changes, or to use new Gemini 2 capabilities.
Which Gemini 2 model should I migrate to?
As you choose a Gemini 2 model to migrate to, you'll want to consider the features that your application requires, as well as the cost of those features.
For an overview of Gemini 2 model features, see Gemini 2. For an overview of all Google models, see Google models.
For a comparison of available Gemini models, see the following table.
This document outlines an eight-step process for migrating your application to Gemini 2. Use the following diagram to navigate to each step.
Before you begin
Step 1: Complete prerequisites
For a seamless Gemini 2 migration, we recommend that you address the following concerns before you begin the migration process.
Model retirement awareness: Note the model version support and availability timelines for older Gemini models, and make sure your migration is completed before the model you're using is retired.
InfoSec, governance, and regulatory approvals: Proactively request the approvals you need for Gemini 2 from your information security (InfoSec), risk, and compliance stakeholders. Make sure that you cover domain-specific risk and compliance constraints, especially in heavily regulated industries such as healthcare and financial services. Note that Gemini security controls differ among Gemini 2 models.
Modality and tokenization-based pricing differences: Check Gemini 2 pricing for all the modalities (text, code, images, speech) in your application. For more information, see generative AI pricing page. Note that Gemini 2 text input and output is priced per token, while Gemini 1 text input and output is priced per character.
Supervised fine-tuning: If your Gemini application uses supervised fine-tuning, submit a new tuning job with Gemini 2. We recommend that you start with the default tuning hyperparameters instead of reusing the hyperparameter values that you used with previous Gemini versions. The tuning service has been optimized for Gemini 2. Therefore, reusing previous hyperparameter values might not yield the best results.
Regression testing: There are three main types of regression tests involved when upgrading to Gemini 2 models:
Code regression tests: Regression testing from a software engineering and DevOps perspective. This type of regression test is always required.
Model performance regression tests: Regression testing from a data science or machine learning perspective. This means ensuring that the new Gemini 2 model provides outputs that are at least as high-quality as outputs from the current production model. Model performance regression tests are just model evaluations done as part of a change to a system or to the underlying model. Model performance regression testing further breaks down into:
Offline model performance testing: Assessing the quality of model outputs in a dedicated experimentation environment based on various model output quality metrics.
Online model performance testing: Assessing the quality of model outputs in a live online deployment based on implicit or explicit user feedback.
Load testing: Assessing how the application handles high volumes of inference requests. This type of regression test is required for applications that require Provisioned Throughput.
Document model evaluation and testing requirements
Step 2: Document evaluation and testing requirements
Prepare to repeat any relevant evaluations from when you originally built your application, along with any relevant evaluations you have done since then.
If you feel your existing evaluations don't appropriately cover or measure the breadth of tasks that your application performs, you should design and prepare additional evaluations.
If your application involves RAG, tool use, complex agentic workflows, or prompt chains, make sure that your existing evaluation data allows for assessing each component independently. If not, gather input-output examples for each component.
If your application is especially high-impact, or if it's part of a larger user-facing real-time system, you should include online evaluation.
Code upgrades and testing
Step 3: Upgrade and test code
If your Gemini 1.x application uses the Vertex AI SDK, consider upgrading to the Gen AI SDK. New Gemini 2 capabilities are only available in the Gen AI SDK. However, there is no need to switch to the Gen AI SDK if your application only requires capabilities that are available in the Vertex AI SDK.
Feature / Aspect
Gen AI SDK
Vertex AI SDK
Primary Focus
Optimized for Gemini models and new generative AI features.
Broader Vertex AI platform, including conventional ML workflows.
Gemini 2 Feature Support
Full support for all new Gemini 2 capabilities.
Limited support; new features are not added.
Recommendation
Recommended for new applications or when using advanced Gemini 2 features.
Suitable if your application only uses features common to both SDKs and is already integrated.
We recommend that you migrate to the Gen AI SDK when upgrading to Gemini 2.0. If you choose to use the Gen AI SDK, the setup process is different from the Vertex AI SDK. For more information, visit Google Gen AI SDK.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=globalexportGOOGLE_GENAI_USE_VERTEXAI=True
fromgoogleimportgenaifromgoogle.genai.typesimportHttpOptionsclient=genai.Client(http_options=HttpOptions(api_version="v1"))response=client.models.generate_content(model="gemini-2.5-flash",contents="How does AI work?",)print(response.text)# Example response:# Okay, let's break down how AI works. It's a broad field, so I'll focus on the ...# Here's a simplified overview:# ...
Replace GOOGLE_CLOUD_PROJECT with your Google Cloud project ID, and replace GOOGLE_CLOUD_LOCATION with the location of your Google Cloud project (for example, us-central1).
The following is a short code sample that uses the Vertex AI SDK for Python:
importvertexaifromvertexai.generative_modelsimportGenerativeModel# TODO(developer): Update and un-comment below line# PROJECT_ID = "your-project-id"vertexai.init(project=PROJECT_ID,location="us-central1")model=GenerativeModel("gemini-2.0-flash-001")response=model.generate_content("What's a good name for a flower shop that specializes in selling bouquets of dried flowers?")print(response.text)# Example response:# **Emphasizing the Dried Aspect:**# * Everlasting Blooms# * Dried & Delightful# * The Petal Preserve# ...
Replace PROJECT_ID with your Google Cloud project ID, and replace LOCATION with the location of your Google Cloud project (for example, us-central1). Then, change the model ID from gemini-1.5-flash-002 to gemini-2.0-flash.
Change your Gemini calls
Change your prediction code to use Gemini 2. At a minimum, this means changing the specific model endpoint name to a Gemini 2 model where you load your model.
The exact code change will differ depending on how you originally implemented your application, and especially whether you used the Gen AI SDK or the Vertex AI SDK.
After you make your code changes, perform code regression testing and other software tests on your code to make sure that it runs. This test is only meant to assess whether the code functions correctly. It's not meant to assess the quality of model responses.
Top-K token sampling parameter: Models after gemini-1.0-pro-vision don't support changing the Top-K parameter.
Focus only on code changes in this step. You may need to make other changes, but wait until you start your evaluation, and then consider the following adjustment based on evaluation results:
If you're switching from dynamic retrieval, you might need to experiment with system instructions to control when Google Search is used (for example, "Only generate queries for the Google Search tool if the user asks about sports. Don't generate queries for any other topic."), but wait until you evaluate before making prompt changes.
If you used the Top-K parameter, adjust other token sampling parameters, such as Top-P, to achieve similar results.
Offline evaluation
Step 4: Perform offline evaluation
Repeat the evaluation that you did when you originally developed and launched your application, any further offline evaluation you did after launching, and any additional evaluation you identified in step 1. If you then feel that your evaluation doesn't fully capture the breadth and depth of your application, do further evaluation.
If you don't have an automated way to run your offline evaluations, consider using the Gen AI evaluation service.
If your application uses fine-tuning, perform offline evaluation before retuning your model with Gemini 2. Gemini 2's improved output quality may mean that your application no longer requires a fine-tuned model.
Assess evaluation results and tune the Gemini 2 prompts and hyperparameters
Step 5: Assess results and tune prompts
If your offline evaluation shows a drop in performance with Gemini 2, iterate on your application as follows until Gemini performance matches the older model:
If your application already relies on fine-tuning, try fine-tuning Gemini 2.
If your application is impacted by Dynamic Retrieval and Top-K breaking changes, experiment with changing your prompt and token sampling parameters.
Load testing
Step 6: Conduct load testing
If your application requires a certain minimum throughput, perform load testing to make sure the Gemini 2 version of your application meets your throughput requirements.
Load testing should happen before online evaluation, because online evaluation requires exposing Gemini 2 to production traffic. Use your existing load testing instrumentation to perform this step.
If your application already meets throughput requirements, consider using Provisioned Throughput. You'll need additional short-term Provisioned Throughput to cover load testing while your existing Provisioned Throughput order continues to serve production traffic.
Online evaluation
Step 7: Perform online evaluation
Only proceed to online evaluation if your offline evaluation shows adequate Gemini output quality and your application requires online evaluation.
Online evaluation is a special case of online testing. Try to use your organization's existing tools and procedures for online evaluation. For example:
If your organization regularly conducts A/B tests, perform an A/B test that evaluates the current implementation of your application compared to the Gemini 2 version.
If your organization regularly conducts canary deployments, be sure to do so with Gemini 2 and measure differences in user behavior.
Online evaluation can also be done by building new feedback and measurement capabilities into your application. Different feedback and measurement capabilities are appropriate for different applications. For example:
Adding thumbs-up and thumbs-down buttons next to model outputs and comparing thumbs-up versus thumbs-down rates between an older model and Gemini 2.
Presenting users with the older model and Gemini 2 outputs side-by-side and asking for users to pick their favorite.
Tracking how often users override or manually adjust older model versus Gemini 2 outputs.
These kinds of feedback mechanisms often require running a Gemini 2 version of your application in parallel to your existing version. This parallel deployment is sometimes called "shadow mode" or "blue-green deployment".
If online evaluation results differ significantly from offline evaluation results, your offline evaluation is not capturing key aspects of the live environment or user experience. Use the online evaluation findings to devise a new offline evaluation to cover the gap the online evaluation exposed, and then return to step 3.
Once your evaluation shows that Gemini 2 meets or exceeds performance of an older model, turn down the existing version of your application in favor of the Gemini 2 version. Follow your organization's existing procedures for production rollout.
If you're using Provisioned Throughput, change your Provisioned Throughput order to your chosen Gemini 2 model. If you're rolling out your application incrementally, use short-term Provisioned Throughput to meet throughput requirements for two different Gemini models.
Improving model performance
As you complete your migration, use the following tips to maximize Gemini 2 model performance:
Test a more powerful model. For example, if you evaluated Gemini 2.0 Flash-Lite, try Gemini 2.0 Flash.
Examine any automated evaluation results to make sure they match human judgment, especially results that use a judge model. Make sure your judge model instructions don't contain inconsistencies or ambiguities.
One way to improve judge model instructions is to test the instructions with multiple humans in isolation and see if their judgments are consistent. If humans interpret the instructions differently and render different judgments, your judge model instructions are ambiguous.
Examine evaluation outputs to look for patterns that show specific kinds of failures. Grouping together failures into different models, kinds, or categories gives you more targeted evaluation data, which makes it easier to adjust prompts to address these errors.
Make sure you are independently evaluating different generative AI components.
If you need help, Google Cloud offers support packages to meet your needs, such as 24/7 coverage, phone support, and access to a technical support manager. For more information, see Google Cloud Support.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-07-02 UTC."],[],[]]