Compare prompts

This guide shows how to use the Compare feature in Vertex AI to evaluate and iterate on your prompts. The Compare feature lets you view prompts and their responses side-by-side to see how a different prompt, model, or parameter setting changes the model's output.

The following diagram summarizes the overall workflow:

You can compare prompts using the following methods:

Comparison Method Description Use Case
Compare with a new prompt Compare a saved prompt against a new, unsaved prompt. For quick iterations and testing small changes to an existing prompt without saving each version.
Compare with another saved prompt Compare two existing, saved prompts side-by-side. For evaluating two distinct, well-defined prompt versions or approaches that you have saved previously.
Compare with a ground truth Compare a prompt's output against a predefined, ideal answer. For quantitative evaluation and scoring of a model's response against a benchmark or "correct" answer.

The Compare feature doesn't support prompts with media or chat prompts that have more than one exchange.

Before you begin

To access the Compare feature, follow these steps:

  1. In the Google Cloud console, go to the Create prompt page.

    Go to Create prompt

  2. Select Compare. The Compare page appears.

Create a prompt in the Compare feature

On the Compare page, you can create and save a prompt before you compare it with another prompt.

To create a prompt, follow these steps:

  1. In the New Prompt field, enter your prompt.
  2. Click Submit prompts. The model's response appears below the prompt text.
  3. Click Save as new. The Save prompt dialog appears.
  4. In the Prompt name field, enter a name for your new prompt.
  5. In the Region field, select your region, or leave it as the default.
  6. If a customer-managed encryption key (CMEK) applies, do the following:
    1. Select the Customer-managed encryption key (CMEK) checkbox.
    2. From the Select a Cloud KMS key field, select a key.
  7. Click Save. Your prompt is saved and is available to select from the Compare saved prompt page.

Compare with a new prompt

To compare your saved prompt with a new prompt, follow these steps:

  1. Click Compare new prompt. The Compare pane appears.
  2. Optional: To use a different model, click Switch model.
  3. Optional: To configure the output, expand Outputs and set the following options:

    • Structured output: To make the model output in a specific format like JSON, click the Structured output toggle. Grounding options are turned off when you select this option, because grounding isn't supported with structured output.
    • Thinking budget: Change the budget to one of the following:

      • Auto: The model adjusts its analysis based on the complexity of the prompt.
      • Manual: Lets you manually adjust the thinking budget in tokens.
      • Off: Disables thinking and budgets.
  4. Optional: To add tools, expand Tools and select one of the following grounding options:

    • Grounding: Google: Grounding with Google Search or Google Maps.
    • Grounding: Your data: Grounding with Vertex AI RAG Engine, Vertex AI Search, or Elasticsearch. If you select this option, select the data source that you want to use.
  5. Optional: To configure advanced settings, expand Advanced and set the following options:

    • Region: Select a region.
    • Safety Filter Settings: Keep the default of Off, or select Block few, Block some, or Block most for the following categories:

      • Hate speech: Negative or harmful comments targeting identity or protected attributes.
      • Dangerous content: Promotes or enables access to harmful goods, services, and activities.
      • Sexually explicit content: Contains references to sexual acts or other lewd content.
      • Harassment content: Malicious, intimidating, bullying, or abusive comments targeting another individual.
    • Temperature: Controls the randomness in token selection. A lower temperature is better for responses that need to be correct, while a higher temperature can lead to more diverse or unexpected results.

    • Output token limit: Determines the maximum amount of text output from one prompt. A token is approximately four characters.

    • Max responses: The maximum number of model responses generated per prompt. Responses can still be blocked due to safety filters or other policies.

    • Top-P: Changes how the model selects tokens for output.

    • Stream model responses: If selected, responses are displayed as they're generated.

    • Add stop sequence: Enter a sequence that signals the model to stop generating content. Press Enter after each sequence.

  6. Click Save to save changes to your settings.

  7. Click Apply.

  8. Click Submit prompts to compare the prompts and their responses.

For more information on token limits for each model, see Control the thinking budget.

Compare with another saved prompt

To compare your saved prompt with another saved prompt, follow these steps:

  1. Click Compare saved prompt. The Existing Prompt pane appears.
  2. Select up to two existing prompts to compare:
    1. Select a Prompt name. If you have many prompts in your list, click in the Filter field, and select the property that you want to filter by. Enter a value, and press Enter.
    2. Click Apply. The Compare page displays the selected prompts for a side-by-side comparison.
  3. Click Submit prompts to compare the prompts and their responses.

Compare with a ground truth

A ground truth is your preferred, high-quality answer to a prompt. When you provide a ground truth, all other model responses are evaluated against it.

To compare a prompt's response with a ground truth, follow these steps:

  1. Click Ground truth. The Ground truth pane appears.
  2. Enter your ground truth answer. This is used to generate evaluation metrics.
  3. Click Save to save the ground truth.
  4. Click Submit prompts to compare the prompts and their responses.

The evaluation metrics generated from a ground truth comparison are not affected by the selected region.

What's next