This guide shows how to use the Compare feature in Vertex AI to evaluate and iterate on your prompts. The Compare feature lets you view prompts and their responses side-by-side to see how a different prompt, model, or parameter setting changes the model's output. The following diagram summarizes the overall workflow:
You can compare prompts using the following methods: The Compare feature doesn't support prompts with media or chat prompts that have more than one exchange. To access the Compare feature, follow these steps: In the Google Cloud console, go to the Create prompt page. Select Compare. The Compare page appears. On the Compare page, you can create and save a prompt before you compare it with another prompt. To create a prompt, follow these steps: To compare your saved prompt with a new prompt, follow these steps: Optional: To configure the output, expand Outputs and set the following options: Thinking budget: Change the budget to one of the following: Optional: To add tools, expand Tools and select one of the following grounding options: Optional: To configure advanced settings, expand Advanced and set the following options: Safety Filter Settings: Keep the default of Off, or select Block few, Block some, or Block most for the following categories: Temperature: Controls the randomness in token selection. A lower temperature is better for responses that need to be correct, while a higher temperature can lead to more diverse or unexpected results. Output token limit: Determines the maximum amount of text output from one prompt. A token is approximately four characters. Max responses: The maximum number of model responses generated per prompt. Responses can still be blocked due to safety filters or other policies. Top-P: Changes how the model selects tokens for output. Stream model responses: If selected, responses are displayed as they're generated. Add stop sequence: Enter a sequence that signals the model to stop generating content. Press Enter after each sequence. Click Save to save changes to your settings. Click Apply. Click Submit prompts to compare the prompts and their responses. For more information on token limits for each model, see Control the thinking budget. To compare your saved prompt with another saved prompt, follow these steps: A ground truth is your preferred, high-quality answer to a prompt. When you provide a ground truth, all other model responses are evaluated against it. To compare a prompt's response with a ground truth, follow these steps: The evaluation metrics generated from a ground truth comparison are not affected by the selected region.
Comparison Method
Description
Use Case
Compare with a new prompt
Compare a saved prompt against a new, unsaved prompt.
For quick iterations and testing small changes to an existing prompt without saving each version.
Compare with another saved prompt
Compare two existing, saved prompts side-by-side.
For evaluating two distinct, well-defined prompt versions or approaches that you have saved previously.
Compare with a ground truth
Compare a prompt's output against a predefined, ideal answer.
For quantitative evaluation and scoring of a model's response against a benchmark or "correct" answer.
Before you begin
Create a prompt in the Compare feature
Compare with a new prompt
Compare with another saved prompt
Compare with a ground truth
What's next
Compare prompts
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-26 UTC.