Migrate to the latest Gemini models

This guide explains how to update your application to the latest Gemini version. This guide assumes your application already uses an older Gemini version. To learn how to start using Gemini in Vertex AI, see the Gemini API in Vertex AI quickstart.

This guide doesn't cover how to switch your application from the Vertex AI SDK to the current Google Gen AI SDK. For that information, see our Vertex AI SDK migration guide.

What changes should I expect?

Updating most generative AI applications to the latest Gemini version requires few code or prompt changes. However, some applications may require prompt adjustments. It's hard to predict these changes without first testing your prompts with the new version. Thorough testing is recommended before fully migrating. For tips on creating effective prompts, see our prompt strategy guidance. Use our prompt health checklist to help find and fix prompt issues.

You only need to make major code changes for certain breaking changes or to use new Gemini capabilities.

Which Gemini model should I migrate to?

The Gemini model you use depends on your application's needs. The following table compares the older Gemini 1.5 models with the latest Gemini models:

Feature	1.5 Pro	1.5 Flash	2.0 Flash	2.0 Flash-Lite	2.5 Pro	2.5 Flash	2.5 Flash-Lite
Launch stage	Retired	Retired	Generally available	Generally available	Generally available	Generally available	Generally available
Input modalities	Text, Code, Images, Audio, Video	Text, Code, Images, Audio, Video	Text, Code, Images, Audio, Video	Text, Code, Images, Audio, Video	Text, Code, Images, Audio, Video	Text, Code, Images, Audio, Video	Text, Code, Images, Audio, Video
Output modalities	Text	Text	Text	Text	Text	Text	Text
Context window, total token limit	2,097,152	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576
Output context length	8,192 (default)	8,192 (default)	8,192 (default)	8,192 (default)	65,535 (default)	65,535 (default)	65,536 (default)
Grounding with Google Search
Function calling
Code execution
Context caching
Batch prediction
Live API^*
Fine-tuning
Latency
Recommended SDK	Vertex AI SDK	Vertex AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK
Pricing units	Character	Character	Token	Token	Token	Token	Token
Retirement date	September 24, 2025	September 24, 2025	February 5, 2026	February 25, 2026	June 17, 2026	June 17, 2026	July 22, 2026

^* The Live API is available as a preview offering as part of gemini-live-2.5-flash and gemini-live-2.5-flash-preview-native-audio.

Before you begin migrating

Before you start the migration process, you should consider the following:

Information security (InfoSec), governance, and regulatory approvals
Location availability
Modality and tokenization-based pricing differences
Purchase or change Provisioned Throughput orders
Supervised fine-tuning
Regression testing

InfoSec, governance, and regulatory approvals

Obtain approvals from your information security (InfoSec), risk, and compliance teams early. Cover any specific risk and compliance rules, especially in regulated industries like healthcare and finance.

Location availability

Google and Partner models and generative AI features on Vertex AI are available through specific regional endpoints and a global endpoint. Global endpoints cover the entire world and offer improved availability and reliability compared to single regions.

Regional endpoint availability varies by model. For details on each model, see our locations guide.

Modality and tokenization-based pricing differences

Pricing varies between each Gemini model. Our pricing page lists costs for all modalities (text, code, images, speech, etc.) per model.

Purchase or change Provisioned Throughput orders

If needed, purchase more Provisioned Throughput or change existing Provisioned Throughput orders.

Supervised fine-tuning

The latest Gemini models offer better output quality. This can mean your application no longer needs a fine-tuned model. If your application uses supervised fine-tuning with an older Gemini model, first test your application with the latest model without fine-tuning and evaluate the results.

If you choose to use supervised fine-tuning, you cannot move your existing tuned model from older Gemini versions. You need to run a new tuning job for the new Gemini version.

When tuning a new Gemini model, start with the default tuning settings. Don't reuse hyperparameter values from previous Gemini versions, because the tuning service is optimized for the latest versions. Reusing old settings is unlikely to give optimal results.

Regression testing

When upgrading to the latest Gemini version, you'll need three main types of regression tests:

Code regression tests: Regression testing from a software engineering and developer operations (DevOps) perspective. This type of regression testing is always required.
Model performance regression tests: Regression testing from a data science or machine learning perspective. This means ensuring that the new Gemini model version provides outputs that at least maintain the same level of quality as the previous version.

Model performance regression tests are model evaluations done when a system or its underlying model changes. They include:
- Offline performance testing: Tests that assert the quality of model outputs in a dedicated experimentation environment based on various model output quality metrics.
- Online model performance testing: Tests that assert the quality of model outputs in a live, online deployment based on implicit or explicit user feedback.
Load testing: These tests check how well the application handles many requests at once. Load testing is required for applications that use Provisioned Throughput.

How to migrate to the latest version

The following sections outline the steps to migrate to the latest Gemini version. For optimal results, complete these steps in order.

1. Document model evaluation and testing requirements

Prepare to repeat any relevant evaluations you performed when you first built your application, plus any evaluations performed since then.
If your current evaluations don't fully cover or measure all tasks your application performs, design and prepare more evaluations. You can use our evaluation playbook and our evaluation recipes to help you get started.
If your application involves RAG, tool use, complex agentic workflows, or prompt chains, make sure that your existing evaluation data allows for assessing each component independently. If not, gather input-output examples for each component.
If your application is critical or part of a larger user-facing real-time system, include online evaluation.

2. Make code upgrades and run tests

Upgrading your code requires three main changes:

Upgrade to the Google Gen AI SDK
Change your Gemini calls
Fix breaking code changes

The following sections goes over these changes in further detail.

Upgrade to the Google Gen AI SDK

If your Gemini 1.x application uses the Vertex AI SDK, switch to the Gen AI SDK. See our Vertex AI SDK migration guide for details, including code examples for making similar calls with the Gen AI SDK. Vertex AI SDK releases after June 2026 won't support Gemini, and new Gemini features are only available in the Gen AI SDK.

If you're new to the Gen AI SDK, see the Getting started with Google Generative AI using the Gen AI SDK notebook.

Change your Gemini calls

Update your prediction code to use one of the latest Gemini models. At a minimum, this means changing the model endpoint name.

The exact code changes will vary based on how you built your application, especially whether you used the Gen AI SDK or the Vertex AI SDK.

After making code changes, run code regression tests and other software tests to ensure your code functions as expected. This step checks whether the code functions, but not the quality of model responses.

Fix breaking code changes

Dynamic retrieval: Switch to using Grounding with Google Search. This feature requires the Gen AI SDK and isn't supported by the Vertex AI SDK.
Content filters: Note the default content filter settings. Change your code if it relies on a default that has changed.
Top-K token sampling parameter: Models after gemini-1.0-pro-vision don't support changing the Top-K parameter.

For this step, focus only on code changes. You may need to make other changes later, but wait until you start your evaluation. After your evaluations, consider these adjustments based on the evaluation results:

If you're switching from dynamic retrieval, you may need to adjust your system instructions to control when Google Search is used (for example, "Only generate queries for the Google Search tool if the user asks about sports. Don't generate queries for any other topic."). However, wait until you evaluate before changing prompts.
If you used the Top-K parameter, adjust other token sampling parameters, like Top-P, to get similar results.

3. Run offline evaluations

Repeat the evaluations you performed when you first developed and launched your application, any offline evaluations done since then, and any additional evaluations you identified in step 1. If you still feel your evaluation doesn't fully cover your application's scope, conduct further evaluations.

If you don't have an automated way to run offline evaluations, consider using the Gen AI evaluation service.

If your application uses fine-tuning, perform offline evaluation before re-tuning your model with the latest version of Gemini. The latest models offer improved output quality, which can mean your application no longer needs a fine-tuned model.

4. Assess evaluation results and tune your prompts and hyperparameters

If your offline evaluation shows your application performing less effectively, improve your application until its performance matches the older model. Do this by:

Iteratively refining your prompts to boost performance ("Hill Climbing"). If you're new to hill climbing, see the Vertex Gemini hill climbing online training. The Vertex AI prompt optimizer (example notebook) can also help.
If your application is affected by Dynamic Retrieval and Top-K breaking changes, experiment with adjusting your prompt and token sampling parameters.

5. Run load tests

If your application needs a certain minimum throughput, perform load testing to ensure the latest version of your application meets your throughput requirements.

Load testing must occur before online evaluation, because online evaluation involves exposing the model to live traffic. Use your existing load testing tools and instrumentation for this step.

If your application already meets throughput needs, consider using Provisioned Throughput. You'll need extra short-term Provisioned Throughput to cover load testing while your current Provisioned Throughput order handles production traffic.

6. (Optional) Run online evaluations

Move to online evaluation only if your offline evaluation shows high Gemini output quality and your application requires online evaluation.

Online evaluation is a specific type of online testing. Try to use your organization's existing tools and methods for online evaluation. For example:

If your organization regularly performs A/B tests, perform one to compare your application's current version with the latest Gemini version.
If your organization regularly uses canary deployments, use them with the latest models and measure changes in user behavior.

You can also do online evaluation by adding new feedback and measurement features to your application. Different applications need different feedback methods. For example:

Adding thumbs-up and thumbs-down buttons next to model outputs and comparing the rates between an older model and the latest Gemini models.
Showing users outputs from both the older model and the latest models side-by-side and asking them to pick their favorite.
Tracking how often users override or manually adjust outputs from the older model versus the latest models.

These feedback methods often require running the latest Gemini version alongside your existing version. This parallel deployment is sometimes called "shadow mode" or "blue-green deployment."

If online evaluation results differ greatly from offline evaluation results, your offline evaluation isn't capturing key aspects of the live environment or user experience. Apply the online evaluation findings to create a new offline evaluation that covers the gap, then return to step 3.

If you use Provisioned Throughput, you may need to purchase additional short-term Provisioned Throughput to continue to meet your throughput requirements for users in online evaluation.

7. Deploy to production

Once your evaluation shows that the latest Gemini model performs as well as or better than an older model, replace the existing application version with the new version. Follow your organization's standard procedures for production rollout.

If you're using Provisioned Throughput, change your Provisioned Throughput order to your chosen Gemini model. If you're rolling out your application incrementally, use short-term Provisioned Throughput to meet throughput needs for two different Gemini models.

Improving model performance

As you migrate, apply these tips to achieve optimal performance from your chosen Gemini model:

Check your system instructions, prompts, and few-shot learning examples for any inconsistencies, contradictions, or irrelevant instructions and examples.
Test a more powerful model. For example, if you evaluated Gemini 2.0 Flash-Lite, try Gemini 2.0 Flash.
Review automated evaluation results to ensure they match human judgment, especially results using a judge model. Ensure your judge model instructions are clear, consistent, and unambiguous.
To improve judge model instructions, test the instructions with multiple humans working in isolation. If humans interpret the instructions differently and provide different judgments, your judge model instructions are unclear.
Fine-tune the model.
Examine evaluation outputs for patterns that show specific types of failures. Grouping failures by model, type, or category provides more targeted evaluation data, making it easier to adjust prompts to fix these errors.
Ensure you are evaluating different generative AI components independently.
Experiment with adjusting token sampling parameters.

Getting help

If you require assistance, Google Cloud offers support packages to meet your needs, such as 24/7 coverage, phone support, and access to a technical support manager. For more information, see Google Cloud Support.

What's next

See the list of frequently asked questions.
Migrate from the PaLM API to the Gemini API in Vertex AI.