This guide explains how to update your application to the latest Gemini version. This guide assumes your application already uses an older Gemini version. To learn how to start using Gemini in Vertex AI, see the Gemini API in Vertex AI in Vertex AI quickstart.
This guide doesn't cover how to switch your application from the Vertex AI SDK to the current Google Gen AI SDK. For that information, see our Vertex AI SDK migration guide.
What changes should I expect?
Updating most generative AI applications to the latest Gemini version requires few code or prompt changes. However, some applications may require prompt adjustments. It's hard to predict these changes without first testing your prompts with the new version. Thorough testing is recommended before fully migrating. For tips on creating effective prompts, see our prompt strategy guidance. Use our prompt health checklist to help find and fix prompt issues.
You only need to make major code changes for certain breaking changes or to use new Gemini capabilities.
Which Gemini model should I migrate to?
The Gemini model you use depends on your application's needs. The following table compares the older Gemini 1.5 models with the latest Gemini models:
Feature | 1.5 Pro | 1.5 Flash | 2.0 Flash | 2.0 Flash-Lite | 2.5 Pro | 2.5 Flash | 2.5 Flash-Lite |
---|---|---|---|---|---|---|---|
Launch stage | Deprecated | Deprecated | Generally available | Generally available | Generally available | Generally available | Generally available |
Input modalities |
|
|
|
|
|
|
|
Output modalities |
|
|
|
|
|
|
|
Context window, total token limit | 2,097,152 | 1,048,576 | 1,048,576 | 1,048,576 | 1,048,576 | 1,048,576 | 1,048,576 |
Output context length | 8,192 (default) | 8,192 (default) | 8,192 (default) | 8,192 (default) | 65,535 (default) | 65,535 (default) | 65,536 (default) |
Grounding with Google Search | |||||||
Function calling | |||||||
Code execution | |||||||
Context caching | |||||||
Batch prediction | |||||||
Live API* | |||||||
Fine-tuning | |||||||
Latency | |||||||
Recommended SDK | Vertex AI SDK | Vertex AI SDK | Gen AI SDK | Gen AI SDK | Gen AI SDK | Gen AI SDK | Gen AI SDK |
Pricing units | Character | Character | Token | Token | Token | Token | Token |
Retirement date | September 24, 2025 | September 24, 2025 | February 5, 2026 | February 25, 2026 | June 17, 2026 | June 17, 2026 | July 22, 2026 |
* The Live API is available as a preview offering as
part of gemini-live-2.5-flash
and gemini-live-2.5-flash-preview-native-audio
.
Before you begin migrating
Before you start the migration process, you should consider the following:
- Information security (InfoSec), governance, and regulatory approvals
- Location availability
- Modality and tokenization-based pricing differences
- Purchase or change Provisioned Throughput orders
- Supervised fine-tuning
- Regression testing
InfoSec, governance, and regulatory approvals
Obtain approvals from your information security (InfoSec), risk, and compliance teams early. Cover any specific risk and compliance rules, especially in regulated industries like healthcare and finance.
Location availability
Google and Partner models and generative AI features on Vertex AI are available through specific regional endpoints and a global endpoint. Global endpoints cover the entire world and offer improved availability and reliability compared to single regions.
Regional endpoint availability varies by model. For details on each model, see our locations guide.
Modality and tokenization-based pricing differences
Pricing varies between each Gemini model. Our pricing page lists costs for all modalities (text, code, images, speech, etc.) per model.
Purchase or change Provisioned Throughput orders
If needed, purchase more Provisioned Throughput or change existing Provisioned Throughput orders.
Supervised fine-tuning
The latest Gemini models offer better output quality. This can mean your application no longer needs a fine-tuned model. If your application uses supervised fine-tuning with an older Gemini model, first test your application with the latest model without fine-tuning and evaluate the results.
If you choose to use supervised fine-tuning, you cannot move your existing tuned model from older Gemini versions. You need to run a new tuning job for the new Gemini version.
When tuning a new Gemini model, start with the default tuning settings. Don't reuse hyperparameter values from previous Gemini versions, because the tuning service is optimized for the latest versions. Reusing old settings is unlikely to give optimal results.
Regression testing
When upgrading to the latest Gemini version, you'll need three main types of regression tests:
- Code regression tests: Regression testing from a software engineering and developer operations (DevOps) perspective. This type of regression testing is always required.
-
Model performance regression tests: Regression testing from a data science or machine learning perspective. This means ensuring that the new Gemini model version provides outputs that at least maintain the same level of quality as the previous version.
Model performance regression tests are model evaluations done when a system or its underlying model changes. They include:
- Offline performance testing: Tests that assert the quality of model outputs in a dedicated experimentation environment based on various model output quality metrics.
- Online model performance testing: Tests that assert the quality of model outputs in a live, online deployment based on implicit or explicit user feedback.
- Load testing: These tests check how well the application handles many requests at once. Load testing is required for applications that use Provisioned Throughput.
How to migrate to the latest version
The following sections outline the steps to migrate to the latest Gemini version. For optimal results, complete these steps in order.
1. Document model evaluation and testing requirements
- Prepare to repeat any relevant evaluations you performed when you first built your application, plus any evaluations performed since then.
- If your current evaluations don't fully cover or measure all tasks your application performs, design and prepare more evaluations. You can use our evaluation playbook and our evaluation recipes to help you get started.
- If your application involves RAG, tool use, complex agentic workflows, or prompt chains, make sure that your existing evaluation data allows for assessing each component independently. If not, gather input-output examples for each component.
- If your application is critical or part of a larger user-facing real-time system, include online evaluation.
2. Make code upgrades and run tests
Upgrading your code requires three main changes:
The following sections goes over these changes in further detail.
Upgrade to the Google Gen AI SDK
If your Gemini 1.x application uses the Vertex AI SDK, switch to the Gen AI SDK. See our Vertex AI SDK migration guide for details, including code examples for making similar calls with the Gen AI SDK. Vertex AI SDK releases after June 2026 won't support Gemini, and new Gemini features are only available in the Gen AI SDK.
If you're new to the Gen AI SDK, see the Getting started with Google Generative AI using the Gen AI SDK notebook.
Change your Gemini calls
Update your prediction code to use one of the latest Gemini models. At a minimum, this means changing the model endpoint name.
The exact code changes will vary based on how you built your application, especially whether you used the Gen AI SDK or the Vertex AI SDK.
After making code changes, run code regression tests and other software tests to ensure your code functions as expected. This step checks whether the code functions, but not the quality of model responses.
Fix breaking code changes
- Dynamic retrieval: Switch to using Grounding with Google Search. This feature requires the Gen AI SDK and isn't supported by the Vertex AI SDK.
- Content filters: Note the default content filter settings. Change your code if it relies on a default that has changed.
Top-K
token sampling parameter: Models aftergemini-1.0-pro-vision
don't support changing theTop-K
parameter.
For this step, focus only on code changes. You may need to make other changes later, but wait until you start your evaluation. After your evaluations, consider these adjustments based on the evaluation results:
- If you're switching from dynamic retrieval, you may need to adjust your
system instructions to control when Google Search is used (for
example,
"Only generate queries for the Google Search tool if the user asks about sports. Don't generate queries for any other topic."
). However, wait until you evaluate before changing prompts. - If you used the
Top-K
parameter, adjust other token sampling parameters, likeTop-P
, to get similar results.
3. Run offline evaluations
Repeat the evaluations you performed when you first developed and launched your application, any offline evaluations done since then, and any additional evaluations you identified in step 1. If you still feel your evaluation doesn't fully cover your application's scope, conduct further evaluations.
If you don't have an automated way to run offline evaluations, consider using the Gen AI evaluation service.
If your application uses fine-tuning, perform offline evaluation before re-tuning your model with the latest version of Gemini. The latest models offer improved output quality, which can mean your application no longer needs a fine-tuned model.
4. Assess evaluation results and tune your prompts and hyperparameters
If your offline evaluation shows your application performing less effectively, improve your application until its performance matches the older model. Do this by:
- Iteratively refining your prompts to boost performance ("Hill Climbing"). If you're new to hill climbing, see the Vertex Gemini hill climbing online training. The Vertex AI prompt optimizer (example notebook) can also help.
- If your application is affected by Dynamic Retrieval and Top-K breaking changes, experiment with adjusting your prompt and token sampling parameters.
5. Run load tests
If your application needs a certain minimum throughput, perform load testing to ensure the latest version of your application meets your throughput requirements.
Load testing must occur before online evaluation, because online evaluation involves exposing the model to live traffic. Use your existing load testing tools and instrumentation for this step.
If your application already meets throughput needs, consider using Provisioned Throughput. You'll need extra short-term Provisioned Throughput to cover load testing while your current Provisioned Throughput order handles production traffic.
6. (Optional) Run online evaluations
Move to online evaluation only if your offline evaluation shows high Gemini output quality and your application requires online evaluation.
Online evaluation is a specific type of online testing. Try to use your organization's existing tools and methods for online evaluation. For example:
- If your organization regularly performs A/B tests, perform one to compare your application's current version with the latest Gemini version.
- If your organization regularly uses canary deployments, use them with the latest models and measure changes in user behavior.
You can also do online evaluation by adding new feedback and measurement features to your application. Different applications need different feedback methods. For example:
- Adding thumbs-up and thumbs-down buttons next to model outputs and comparing the rates between an older model and the latest Gemini models.
- Showing users outputs from both the older model and the latest models side-by-side and asking them to pick their favorite.
- Tracking how often users override or manually adjust outputs from the older model versus the latest models.
These feedback methods often require running the latest Gemini version alongside your existing version. This parallel deployment is sometimes called "shadow mode" or "blue-green deployment."
If online evaluation results differ greatly from offline evaluation results, your offline evaluation isn't capturing key aspects of the live environment or user experience. Apply the online evaluation findings to create a new offline evaluation that covers the gap, then return to step 3.
If you use Provisioned Throughput, you may need to purchase additional short-term Provisioned Throughput to continue to meet your throughput requirements for users in online evaluation.
7. Deploy to production
Once your evaluation shows that the latest Gemini model performs as well as or better than an older model, replace the existing application version with the new version. Follow your organization's standard procedures for production rollout.
If you're using Provisioned Throughput, change your Provisioned Throughput order to your chosen Gemini model. If you're rolling out your application incrementally, use short-term Provisioned Throughput to meet throughput needs for two different Gemini models.
Improving model performance
As you migrate, apply these tips to achieve optimal performance from your chosen Gemini model:
- Check your system instructions, prompts, and few-shot learning examples for any inconsistencies, contradictions, or irrelevant instructions and examples.
- Test a more powerful model. For example, if you evaluated Gemini 2.0 Flash-Lite, try Gemini 2.0 Flash.
- Review automated evaluation results to ensure they match human judgment, especially results using a judge model. Ensure your judge model instructions are clear, consistent, and unambiguous.
- To improve judge model instructions, test the instructions with multiple humans working in isolation. If humans interpret the instructions differently and provide different judgments, your judge model instructions are unclear.
- Fine-tune the model.
- Examine evaluation outputs for patterns that show specific types of failures. Grouping failures by model, type, or category provides more targeted evaluation data, making it easier to adjust prompts to fix these errors.
- Ensure you are evaluating different generative AI components independently.
- Experiment with adjusting token sampling parameters.
Getting help
If you require assistance, Google Cloud offers support packages to meet your needs, such as 24/7 coverage, phone support, and access to a technical support manager. For more information, see Google Cloud Support.
What's next
- See the list of frequently asked questions.
- Migrate from the PaLM API to the Gemini API in Vertex AI.