Migrate your application to Gemini 2 with the Gemini API in Vertex AI

This guide shows how to migrate generative AI applications from Gemini 1.x and PaLM models to Gemini 2 models.

Why migrate to Gemini 2?

Gemini 2 delivers significant performance improvements over Gemini 1.x and PaLM models, along with new capabilities. Additionally, each model version has its own version support and availability timeline and deprecation timeline.

Upgrading most generative AI applications to Gemini 2 shouldn't require significant reengineering of prompts or code. But some applications require prompt changes, and these changes are difficult to predict without running a prompt through Gemini 2 first. Therefore, Gemini 2 testing is recommended before migration.

Significant code changes are only needed for certain breaking changes, or to use new Gemini 2 capabilities.

Which Gemini 2 model should I migrate to?

As you choose a Gemini 2 model to migrate to, you'll want to consider the features that your application requires, as well as the cost of those features.

For an overview of Gemini 2 model features, see Gemini 2. For an overview of all Google models, see Google models.

For a comparison of Gemini 1.x and Gemini 2 models, see the following table.

Feature Gemini 1.0 Pro Gemini 1.5 Pro Gemini 1.5 Flash Gemini 2.0 Flash Gemini 2.0 Flash-Lite
Input modalities text text, documents, image, video, audio text, documents, image, video, audio text, documents, image, video, audio text, documents, image, video, audio
Output modalities text text text text text
Context window, total token limit 32,760 2,097,152 1,048,576 1,048,576 1,048,576
Output context length 8,192 8,192 8,192 8,192 8,192
Grounding with Search No Yes Yes Yes No
Function calling No Yes Yes Yes Yes
Code execution No No No Yes No
Context caching No Yes Yes Yes No
Batch prediction No Yes Yes Yes Yes
Live API No No No No No
Latency Slower than 1.5 Flash Most capable in 1.5 family Fastest in 1.5 family Fast + cost efficient Fast + most cost efficient
Fine-tuning Yes Yes Yes Yes Yes
Recommended SDK Vertex AI SDK Vertex AI SDK Vertex AI SDK Gen AI SDK Gen AI SDK
Pricing units Character Character Character Token Token

Before you begin

For a seamless Gemini 2 migration, we recommend that you address the following concerns before you begin the migration process.

Deprecation awareness

Note the model version support and availability timelines for older Gemini models, and make sure your migration is completed before the model you're using is discontinued.

InfoSec, governance, and regulatory approvals

Proactively request the approvals you need for Gemini 2 from your information security (InfoSec), risk, and compliance stakeholders. Make sure that you cover domain-specific risk and compliance constraints, especially in heavily regulated industries such as healthcare and financial services. Note that Gemini security controls differ among Gemini 2 models.

Location availability

See the Generative AI on Google Cloud models and partner model availability documentation, and make sure your chosen Gemini 2 model is available in the regions where you need it.

Modality and tokenization-based pricing differences

Check Gemini 2 pricing for all the modalities (text, code, images, speech) in your application. For more information, see generative AI pricing page. Note that Gemini 2 text input and output is priced per token, while Gemini 1 text input and output is priced per character.

Provisioned Throughput

If needed, purchase additional Provisioned Throughput for Gemini 2 or change existing Provisioned Throughput orders.

Supervised fine-tuning

If your Gemini application uses supervised fine-tuning, submit a new tuning job with Gemini 2. We recommend that you start with the default tuning hyperparameters instead of reusing the hyperparameter values that you used with previous Gemini versions. The tuning service has been optimized for Gemini 2. Therefore, reusing previous hyperparameter values might not yield the best results.

Regression testing

There are three main types of regression tests involved when upgrading to Gemini 2 models:

  • Code regression tests: Regression testing from a software engineering and DevOps perspective. This type of regression test is always required.
  • Model performance regression tests: Regression testing from a data science or machine learning perspective. This means ensuring that the new Gemini 2 model provides outputs that are at least as high-quality as outputs from the current production model. Model performance regression tests are just model evaluations done as part of a change to a system or to the underlying model. Model performance regression testing further breaks down into:
    • Offline model performance testing: Assessing the quality of model outputs in a dedicated experimentation environment based on various model output quality metrics.
    • Online model performance testing: Assessing the quality of model outputs in a live online deployment based on implicit or explicit user feedback.
  • Load testing: Assessing how the application handles high volumes of inference requests. This type of regression test is required for applications that require Provisioned Throughput.

Document model evaluation and testing requirements

  1. Prepare to repeat any relevant evaluations from when you originally built your application, along with any relevant evaluations you have done since then.
  2. If you feel your existing evaluations don't appropriately cover or measure the breadth of tasks that your application performs, you should design and prepare additional evaluations.
  3. If your application involves RAG, tool use, complex agentic workflows, or prompt chains, make sure that your existing evaluation data allows for assessing each component independently. If not, gather input-output examples for each component.
  4. If your application is especially high-impact, or if it's part of a larger user-facing real-time system, you should include online evaluation.

Code upgrades and testing

Consider upgrading to the Google Gen AI SDK

If your Gemini 1.x application uses the Vertex AI SDK, consider upgrading to the Gen AI SDK. New Gemini 2 capabilities are only available in the Gen AI SDK. However, there is no need to switch to the Gen AI SDK if your application only requires capabilities that are available in the Vertex AI SDK. If you're new to the Gen AI SDK, see the Getting started with Google Generative AI using the Gen AI SDK notebook.

Gen AI SDK

We recommend that you migrate to the Gen AI SDK when upgrading to Gemini 2.0.

If you choose to use the Gen AI SDK, the setup process is different from the Vertex AI SDK.

For more information, visit Google Gen AI SDK.

Install

pip install --upgrade google-genai
To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="How does AI work?",
)
print(response.text)
# Example response:
# Okay, let's break down how AI works. It's a broad field, so I'll focus on the ...
#
# Here's a simplified overview:
# ...

Replace GOOGLE_CLOUD_PROJECT with your Google Cloud project ID, and replace GOOGLE_CLOUD_LOCATION with the location of your Google Cloud project (for example, us-central1).

Go

Go

Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (
	"context"
	"fmt"
	"io"

	"google.golang.org/genai"
)

// generateWithText shows how to generate text using a text prompt.
func generateWithText(w io.Writer) error {
	ctx := context.Background()

	client, err := genai.NewClient(ctx, &genai.ClientConfig{
		HTTPOptions: genai.HTTPOptions{APIVersion: "v1"},
	})
	if err != nil {
		return fmt.Errorf("failed to create genai client: %w", err)
	}

	resp, err := client.Models.GenerateContent(ctx,
		"gemini-2.0-flash-001",
		genai.Text("How does AI work?"),
		nil,
	)
	if err != nil {
		return fmt.Errorf("failed to generate content: %w", err)
	}

	respText, err := resp.Text()
	if err != nil {
		return fmt.Errorf("failed to convert model response to text: %w", err)
	}
	fmt.Fprintln(w, respText)
	// Example response:
	// That's a great question! Understanding how AI works can feel like ...
	// ...
	// **1. The Foundation: Data and Algorithms**
	// ...

	return nil
}

Vertex AI SDK

If you reuse the Vertex AI SDK, the setup process is the same for the 1.0, 1.5, and 2.0 models. For more information, see Introduction to the Vertex AI SDK for Python.

Install the SDK:

pip install --upgrade --quiet google-cloud-aiplatform

The following is a short code sample that uses the Vertex AI SDK for Python:

import vertexai
from vertexai.generative_models import GenerativeModel

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-2.0-flash-001")

response = model.generate_content(
    "What's a good name for a flower shop that specializes in selling bouquets of dried flowers?"
)

print(response.text)
# Example response:
# **Emphasizing the Dried Aspect:**
# * Everlasting Blooms
# * Dried & Delightful
# * The Petal Preserve
# ...

Replace PROJECT_ID with your Google Cloud project ID, and replace LOCATION with the location of your Google Cloud project (for example, us-central1). Then, change the model ID from gemini-1.5-flash-002 to gemini-2.0-flash.

Change your Gemini calls

Change your prediction code to use Gemini 2. At a minimum, this means changing the specific model endpoint name to a Gemini 2 model where you load your model.

The exact code change will differ depending on how you originally implemented your application, and especially whether you used the Gen AI SDK or the Vertex AI SDK.

After you make your code changes, perform code regression testing and other software tests on your code to make sure that it runs. This test is only meant to assess whether the code functions correctly. It's not meant to assess the quality of model responses.

Address breaking code changes

Focus only on code changes in this step. You may need to make other changes, but wait until you start your evaluation, and then consider the following adjustment based on evaluation results:

  • If you're switching from dynamic retrieval, you might need to experiment with system instructions to control when Google Search is used (for example, "Only generate queries for the Google Search tool if the user asks about sports. Don't generate queries for any other topic."), but wait until you evaluate before making prompt changes.
  • If you used the Top-K parameter, adjust other token sampling parameters, such as Top-P, to achieve similar results.

Offline evaluation

Repeat the evaluation that you did when you originally developed and launched your application, any further offline evaluation you did after launching, and any additional evaluation you identified in step 1. If you then feel that your evaluation doesn't fully capture the breadth and depth of your application, do further evaluation.

If you don't have an automated way to run your offline evaluations, consider using the Gen AI evaluation service.

If your application uses fine-tuning, perform offline evaluation before retuning your model with Gemini 2. Gemini 2's improved output quality may mean that your application no longer requires a fine-tuned model.

Assess evaluation results and tune the Gemini 2 prompts and hyperparameters

If your offline evaluation shows a drop in performance with Gemini 2, iterate on your application as follows until Gemini performance matches the older model:

Load testing

If your application requires a certain minimum throughput, perform load testing to make sure the Gemini 2 version of your application meets your throughput requirements.

Load testing should happen before online evaluation, because online evaluation requires exposing Gemini 2 to production traffic. Use your existing load testing instrumentation to perform this step.

If your application already meets throughput requirements, consider using Provisioned Throughput. You'll need additional short-term Provisioned Throughput to cover load testing while your existing Provisioned Throughput order continues to serve production traffic.

Online evaluation

Only proceed to online evaluation if your offline evaluation shows adequate Gemini output quality and your application requires online evaluation.

Online evaluation is a special case of online testing. Try to use your organization's existing tools and procedures for online evaluation. For example:

  • If your organization regularly conducts A/B tests, perform an A/B test that evaluates the current implementation of your application compared to the Gemini 2 version.
  • If your organization regularly conducts canary deployments, be sure to do so with Gemini 2 and measure differences in user behavior.

Online evaluation can also be done by building new feedback and measurement capabilities into your application. Different feedback and measurement capabilities are appropriate for different applications. For example:

  • Adding thumbs-up and thumbs-down buttons next to model outputs and comparing thumbs-up versus thumbs-down rates between an older model and Gemini 2.
  • Presenting users with the older model and Gemini 2 outputs side-by-side and asking for users to pick their favorite.
  • Tracking how often users override or manually adjust older model versus Gemini 2 outputs.

These kinds of feedback mechanisms often require running a Gemini 2 version of your application in parallel to your existing version. This parallel deployment is sometimes called "shadow mode" or "blue-green deployment".

If online evaluation results differ significantly from offline evaluation results, your offline evaluation is not capturing key aspects of the live environment or user experience. Use the online evaluation findings to devise a new offline evaluation to cover the gap the online evaluation exposed, and then return to step 3.

If you use Provisioned Throughput, you may need to purchase additional short-term Provisioned Throughput to continue to meet your throughput requirements for users subject to online evaluation.

Production deployment

Once your evaluation shows that Gemini 2 meets or exceeds performance of an older model, turn down the existing version of your application in favor of the Gemini 2 version. Follow your organization's existing procedures for production rollout.

If you're using Provisioned Throughput, change your Provisioned Throughput order to your chosen Gemini 2 model. If you're rolling out your application incrementally, use short-term Provisioned Throughput to meet throughput requirements for two different Gemini models.

Improving model performance

As you complete your migration, use the following tips to maximize Gemini 2 model performance:

  • Inspect your system instructions, prompts, and few-shot learning examples for any inconsistencies, contradictions, or irrelevant instructions and examples.
  • Test a more powerful model. For example, if you evaluated Gemini 2.0 Flash-Lite, try Gemini 2.0 Flash.
  • Examine any automated evaluation results to make sure they match human judgment, especially results that use a judge model. Make sure your judge model instructions don't contain inconsistencies or ambiguities.
  • One way to improve judge model instructions is to test the instructions with multiple humans in isolation and see if their judgments are consistent. If humans interpret the instructions differently and render different judgments, your judge model instructions are ambiguous.
  • Fine-tune the Gemini 2 model.
  • Examine evaluation outputs to look for patterns that show specific kinds of failures. Grouping together failures into different models, kinds, or categories gives you more targeted evaluation data, which makes it easier to adjust prompts to address these errors.
  • Make sure you are independently evaluating different generative AI components.
  • Experiment with adjusting token sampling parameters.

Getting help

If you need help, Google Cloud offers support packages to meet your needs, such as 24/7 coverage, phone support, and access to a technical support manager. For more information, see Google Cloud Support.

What's next