This guide shows you how to use the data-driven optimizer to automatically improve prompt performance by optimizing system instructions and selecting few-shot examples. This page covers the following topics: The following diagram summarizes the workflow for optimizing a prompt:
The data-driven optimizer helps you improve your prompts quickly and at scale, without manually rewriting system instructions or individual prompts. This is especially useful when adapting prompts written for one model to a different model. For example, to optimize system instructions for a set of prompts that answer cooking questions using contextual information, you prepare the following inputs for the data-driven optimizer. System instructions Prompt template Sample prompts Optimized system instructions The data-driven optimizer uses the following parameters: When you run the optimizer, it starts a custom training job that iteratively evaluates your sample prompts and rewrites your system instructions to find the version that produces the best evaluation score for the target model. At the end of the job, the optimizer outputs the optimized system instructions and their evaluation score. The data-driven optimizer uses evaluation metrics to optimize
system instructions and select sample prompts. You can use the
standard evaluation metrics or define your
own custom evaluation metrics. Note: All evaluation metrics MUST have the property that higher score indicates better performance. You can use multiple metrics at a time. However, custom metrics
can only be used one at a time. If you use standard and custom metrics
together, only one of the metrics can be a custom metric. The others must be
standard metrics. To learn how to specify metrics one at a time or in combination, see
EVALUATION_METRIC_PARAMETERS in the SDK tab in
Create a prompt template and system instructions. Custom metrics are useful when standard metrics don't fit your application.
Note that the
data-driven optimizer only supports one custom metric
at a time. To learn how to create custom metrics, see
Create custom metrics. The data-driven optimizer supports the following standard evaluation metrics: * To use ** To use
To ensure that the Compute Engine default service account has the necessary
permissions to optimize prompts,
ask your administrator to grant the Compute Engine default service account the
following IAM roles on the project:
For more information about granting roles, see Manage access to projects, folders, and organizations.
Your administrator might also be able to give the Compute Engine default service account
the required permissions through custom
roles or other predefined
roles.
You can optimize prompts using the methods described in the following table. To optimize prompts, choose your preferred method and complete the following steps: Prompt templates define the format of your prompts through replaceable variables. When you run the optimizer, these variables are replaced by the data in your sample prompts dataset. Prompt template variables need to meet the following requirements: Variables for multimodal inputs must include the Replace Create a prompt template and system instructions using one of the following methods: To create system instructions and a prompt template in the notebook: In Colab Enterprise, open the Vertex AI prompt optimizer notebook. In the Create a prompt template and system instructions section, do the following: In the SYSTEM_INSTRUCTION field, enter your system instructions. For example: In the PROMPT_TEMPLATE field, enter your prompt template. For example: If you want to use Add the If you don't have ground truth responses but previously achieved your targeted results with a Google model, you can add the To run the optimizer through the SDK, create text files for your prompt template and system instructions: Create a text file for your system instructions and define them. For example: Create a text file for your prompt template and define a template with one or more variables. For example: If you want to use Add the If you don't have ground truth responses but previously achieved your targeted results with a Google model, you can add the To get the best results from the data-driven optimizer, use
50-100 sample prompts. The sample prompts contain the data that replaces the variables in the prompt
template. You can use a JSONL or CSV file to store your sample prompts. In the JSONL file, add the prompt data that replaces each variable. For example: To create a custom metric: Create a For example, the Create a Python file named For example, the Deploy your custom evaluation function as a Cloud Run function by running the Replace the following: The configuration specifies the parameters for your prompt optimization job. Create a configuration using one of the following options: To create a configuration in the notebook: In Colab Enterprise, open the data-driven optimizer notebook. In the Configure project settings section, do the following: In the Configure optimization settings section, do the following: Optional: In the Configure advanced optimization settings section,
you can additionally add any of the optional parameters to your
configuration.
To run the optimizer through the SDK, create a JSON configuration file with the parameters for your job. Create a JSON file with the following required parameters: Replace the following: If you're using a single standard evaluation metric, use the following parameter: Replace If you're using a single custom evaluation metric, use the following parameters: Replace the following: If you're using multiple standard evaluation metrics, use the following parameters: Replace the following: If you're using a mix of a single custom metric and one or more standard metrics, use the following parameters: Replace the following: You can also add any of the following optional parameters to your configuration file.
Optional parameters are broken down into 5 categories: Replace the following: Optimization process parameters: Model selection and location parameters: Latency (QPS) parameters: Note: You need to set a QPS that is lower than or equal to your available QPM quota. Otherwise, the job fails. To convert QPM to QPS, divide your QPM by 60. Other parameters: Run the data-driven optimizer using one of the following options: To run the optimizer from the notebook: In Colab Enterprise, open the Vertex AI prompt optimizer notebook. In the Run prompt optimizer section, click play_circle Run cell.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named The response looks similar to the following: To run the optimizer through the SDK, add the following code to your Colab or Notebook. Make the following replacements: Once the optimization completes, examine the output artifacts at the output location specified in the config. After you run the optimizer, review the job's progress using one of the following options: To view the results in the notebook: Open the Vertex AI prompt optimizer notebook. In the Inspect the results section, do the following: In the RESULT_PATH field, add the URI of the Cloud Storage bucket where you configured the optimizer to write results (for example, Click play_circle Run cell. In the Google Cloud console, in the Vertex AI section, go to the Training pipelines page. Click the Custom jobs tab. The optimizer's custom training job appears in the list with its status. When the job is finished, review the optimizations: In the Google Cloud console, go to the Cloud Storage Buckets page: Click the name of your Cloud Storage bucket. Navigate to the folder named after the optimization mode you used ( The folder contains the following files: To view the optimized system instructions, open the
Prompt optimization example
You are a professional chef. Your goal is teaching how to cook healthy cooking recipes to your apprentice.
Given a question from your apprentice and some context, provide the correct answer to the question.
Use the context to return a single and correct answer with some explanation.
Question: {input_question}
Facts: {input_context}
input_question
input_context
What are some techniques for cooking red meat and pork that maximize
flavor and tenderness while minimizing the formation of unhealthy
compounds?
Red meat and pork should be cooked to an internal temperature of 145
degrees fahrenheit (63 degrees celsius) to ensure safety.
Marinating meat in acidic ingredients like lemon juice or vinegar can help
tenderize it by breaking down tough muscle fibers. High-heat cooking methods
like grilling and pan-searing can create delicious browning and
caramelization, but it's important to avoid charring, which can produce
harmful compounds.
What are some creative ways to add flavor and nutrition to protein shakes
without using added sugars or artificial ingredients?
Adding leafy greens like spinach or kale is a great way to boost the
nutritional value of your shake without drastically altering the flavor.
Using unsweetened almond milk or coconut water instead of regular milk can add
a subtle sweetness and a boost of healthy fats or electrolytes, respectively.
Did you know that over-blending your shake can actually heat it up? To keep
things cool and refreshing, blend for shorter bursts and give your blender a
break if needed.
As a highly skilled chef with a passion for healthy cooking, you love sharing your knowledge with
aspiring chefs. Today, a culinary intern approaches you with a question about healthy cooking. Given
the intern's question and some facts, provide a clear, concise, and informative answer that will help
the intern excel in their culinary journey.
How optimization works
Evaluation metrics
Custom evaluation metrics
Standard evaluation metrics
Metric type
Use case
Metric
Description
Model-based
Summarization
summarization_quality
Describes the model's ability to answer questions given a body of text to
reference.
Question answering
question_answering_correctness
*Describes the model's ability to correctly answer a question.
question_answering_quality
Describes the model's ability to answer questions given a body of text to
reference.
Coherence
coherence
Describes the model's ability to provide a coherent response and measures
how well the generated text flows logically and makes sense.
Safety
safety
Describes the model's level of safety, that is, whether the response
contains any unsafe text.
Fluency
fluency
Describes the model's language mastery.
Groundedness
groundedness
Describes the model's ability to provide or reference information included
only in the input text.
Comet
comet**
Describes the model's ability on the quality of a translation against the reference.
MetricX
metricx**
Describes the model's ability on the quality of a translation.
Computation-based
Tool use and function calling
tool_call_valid
*Describes the model's ability to predict a valid tool call.
tool_name_match
*Describes the model's ability to predict a tool call with the correct tool
name. Only the first tool call is inspected.
tool_parameter_key_match
*Describes the model's ability to predict a tool call with the correct
parameter names.
tool_parameter_kv_match
*Describes the model's ability to predict a tool call with the correct
parameter names and key values.
General text generation
bleu
*Holds the result of an algorithm for evaluating the quality of the
prediction, which has been translated from one natural language to another
natural language. The quality of the prediction is considered to be the
correspondence between a prediction parameter and its reference parameter.
exact_match
*Computes whether a prediction parameter matches a reference parameter
exactly.
rouge_1
*Used to compare the provided prediction parameter against a
reference parameter.
rouge_2
*
rouge_l
*
rouge_l_sum
*question_answering_correctness
or computation-based metrics, you need to do one of the following:
* Add a variable that represents the ground truth response to your prompt template.
* If you don't have ground truth responses but have previously achieved good results with a specific model, add the source_model
parameter to your configuration. The optimizer uses this source model to generate the ground truth responses for you.comet
or metricx
, you need to provide the translation_source_field_name
parameter in your configuration. This parameter specifies the field name of the source text in your data. Note that the MetricX score is scaled to a range of 0 (worst) to 25 (best).Before you begin
roles/aiplatform.user
)
roles/storage.objectAdmin
)
roles/artifactregistry.reader
)
roles/run.developer
)
roles/run.invoker
)
roles/aiplatform.serviceAgent
)
Optimize prompts
Method
Description
Use Case
Notebook
Use a guided Colab Enterprise notebook that combines code, explanatory text, and visualizations.
Best for first-time users and those who prefer an interactive, guided experience for experimentation.
REST API
Programmatically create and manage optimization jobs by sending JSON requests to the Vertex AI API endpoint.
Ideal for developers who need to integrate prompt optimization into automated workflows, custom applications, or CI/CD pipelines.
Vertex AI SDK for Python
Use the Python SDK in your own environment (like a local notebook or script) to create and manage optimization jobs.
Suitable for data scientists and developers who prefer programmatic control within a Python environment.
Create a prompt template and system instructions
{}
).-
).MIME_TYPE
string after the variable:
@@@MIME_TYPE
MIME_TYPE
with a supported image, video, audio, or document MIME type. Notebook
Based on the following images and articles respond to the questions.'\n' Be concise,
and answer \"I don't know\" if the response cannot be found in the provided articles or images.
Article 1:\n\n{article_1}\n\nImage 1:\n\n{image_1} @@@image/jpeg\n\nQuestion: {question}
question_answering_correctness
or computation-based evaluations, you need to do one of the following:
{target}
variable to the prompt template to represent the prompt's ground truth response. For example:
Article 1:\n\n{article_1}\n\nImage 1:\n\n{image_1} @@@image/jpeg\n\nQuestion: {question}\n\n Answer: {target}
source_model
parameter to your configuration. The optimizer uses this source model to generate the ground truth responses for you. SDK
Based on the following images and articles respond to the questions.'\n' Be concise, and answer \"I don't know\" if the response cannot be found in the provided articles or images.
Article 1:\n\n{article_1}\n\nImage 1:\n\n{image_1} @@@image/jpeg\n\nQuestion: {question}
question_answering_correctness
or computation-based evaluations, you need to do one of the following:
{target}
variable to the prompt template to represent the prompt's ground truth response. For example:
Article 1:\n\n{article_1}\n\nImage 1:\n\n{image_1} @@@image/jpeg\n\nQuestion: {question}\n\n Answer: {target}
source_model
parameter to your configuration. The optimizer uses this source model to generate the ground truth responses for you.Prepare sample prompts
JSONL file
{"article_1": "The marine life …", "image_1": "gs://path_to_image", "Question": "What are some most effective ways to reduce ocean pollution?", "target": "The articles and images don't answer this question."}
{"article_1": "During the year …", "image_1": "gs://path_to_image", "Question": "Who was the president in 2023?", "target": "Joe Biden"}
CSV file
Optional: Create custom metrics
requirements.txt
file and define the required libraries for your custom evaluation metric function. All functions require the functions-framework
package.requirements.txt
file for a custom metric that computes ROUGE-L would look like this:
functions-framework==3.*
rouge-score
main.py
and write your custom evaluation function. The function needs to accept HTTP POST requests with a JSON input containing response
(the LLM output) and reference
(the ground truth response, if provided).main.py
file for a custom metric that computes ROUGE-L would look like this:
from typing import Any
import json
import functions_framework
from rouge_score import rouge_scorer
# Register an HTTP function with the Functions Framework
@functions_framework.http
def main(request):
request_json = request.get_json(silent=True)
if not request_json:
raise ValueError('Can not find request json.')
"""Extract 'response' and 'reference' from the request payload. 'response'
represents the model's response, while 'reference' represents the ground
truth response."""
response = request_json['response']
reference = request_json['reference']
# Compute ROUGE-L F-measure
scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)
scores = scorer.score(reference, response)
final_score = scores['rougeL'].fmeasure
# Return the custom score in the response
return json.dumps({
# The following key is the CUSTOM_METRIC_NAME that you pass to the job
'custom_accuracy': final_score,
# The following key is optional
'explanation': 'ROUGE_L F-measure between reference and response',
})
gcloud functions deploy
command:
gcloud functions deploy FUNCTION_NAME \
--project PROJECT_ID \
--gen2 \
--memory=2Gb \
--concurrency=6 \
--min-instances 6 \
--region=REGION \
--runtime="python310" \
--source="." \
--entry-point main \
--trigger-http \
--timeout=3600 \
--quiet
FUNCTION_NAME
: The name for the custom evaluation metric.PROJECT_ID
: Your project ID.REGION
: The region where you want to deploy the function. It should be the same region as the target model.Create a configuration
Notebook
gs://bucket-name/output-path
).gs://bucket-name/sample-prompts.jsonl
).
instruction
, demonstration
, or instruction_and_demo
.
10
and 20
. If left
unset, the default is 10
.demonstration
and instruction_and_demo
optimization mode.
Must be an integer between 10
and 30
. If left unset, the default is
10
.2
and
and the total number of sample prompts - 1. If left unset, the default is 3
.3.0
or greater, but less than the QPS quota you have
on the target model. If left unset, the default is 3.0
.3.0
or greater, but less
than the QPS quota you have on the source model. If left unset, the
default is 3.0
.
3.0
or
greater. If left unset, the default is 3.0
.3.0
or greater. This
determines the rate at which the data-driven
optimizer calls your custom metric Cloud Run functions.
text/plain
or
application/json
. If left unset, the default is text/plain
. SDK
{
"project": "PROJECT_ID",
"system_instruction": "SYSTEM_INSTRUCTION",
"prompt_template": "PROMPT_TEMPLATE",
"target_model": "TARGET_MODEL",
"thinking_budget": "THINKING_BUDGET,
EVALUATION_METRIC_PARAMETERS,
"optimization_mode": "OPTIMIZATION_MODE",
"input_data_path": "SAMPLE_PROMPT_URI",
"output_path": "OUTPUT_URI"
}
PROJECT_ID
: Your project ID.SYSTEM_INSTRUCTION
: The system instructions to optimize.PROMPT_TEMPLATE
: The prompt template.TARGET_MODEL
: The model to optimize prompts for.THINKING_BUDGET
: The thinking budget for the target model. Defaults to -1 (auto thinking for capable models). For more information, see Thinking.EVALUATION_METRIC_PARAMETERS
: The parameters depend on your choice of evaluation metrics: Single standard metric
"eval_metric": "EVALUATION_METRIC",
EVALUATION_METRIC
with the metric you want to optimize for. Single custom metric
"eval_metric": "custom_metric",
"custom_metric_name": "CUSTOM_METRIC_NAME",
"custom_metric_cloud_function_name": "FUNCTION_NAME",
CUSTOM_METRIC_NAME
: The metric name, as defined by the key corresponding to the final_score
in your function (for example, custom_accuracy
).FUNCTION_NAME
: The name of the deployed Cloud Run function. Multiple standard metrics
"eval_metrics_types": [EVALUATION_METRIC_LIST],
"eval_metrics_weights": [EVAL_METRICS_WEIGHTS],
"aggregation_type": "METRIC_AGGREGATION_TYPE",
EVALUATION_METRIC_LIST
: A list of evaluation metrics (for example, "bleu", "summarization_quality"
).EVAL_METRICS_WEIGHTS
: The weight for each metric, as a list of the same length.METRIC_AGGREGATION_TYPE
: The aggregation type: weighted_sum
or weighted_average
. Default: weighted_sum
. Multiple standard & custom metrics
"eval_metrics_types": ["custom_metric", EVALUATION_METRIC_LIST],
"eval_metrics_weights": [EVAL_METRICS_WEIGHTS],
"aggregation_type": "METRIC_AGGREGATION_TYPE",
"custom_metric_name": "CUSTOM_METRIC_NAME",
"custom_metric_cloud_function_name": "FUNCTION_NAME",
EVALUATION_METRIC_LIST
: A list of the standard evaluation metrics.EVAL_METRICS_WEIGHTS
: The weight for each metric.METRIC_AGGREGATION_TYPE
: The aggregation type: weighted_sum
or weighted_average
. Default: weighted_sum
.CUSTOM_METRIC_NAME
: The name of your custom metric.FUNCTION_NAME
: The name of the deployed Cloud Run function.OPTIMIZATION_MODE
: The optimization mode: instruction
, demonstration
, or instruction_and_demo
.SAMPLE_PROMPT_URI
: The URI for the sample prompts in your Cloud Storage bucket (for example, gs://bucket-name/sample-prompts.jsonl
).OUTPUT_URI
: The URI for the Cloud Storage bucket where the optimizer will write the results (for example, gs://bucket-name/output-path
).
"num_steps": NUM_INST_OPTIMIZATION_STEPS,
"num_demo_set_candidates": "NUM_DEMO_OPTIMIZATION_STEPS,
"demo_set_size": NUM_DEMO_PER_PROMPT,
"target_model_location": "TARGET_MODEL_LOCATION",
"source_model": "SOURCE_MODEL",
"source_model_location": "SOURCE_MODEL_LOCATION",
"target_model_qps": TARGET_MODEL_QPS,
"eval_qps": EVAL_QPS,
"source_model_qps": SOURCE_MODEL_QPS,
"response_mime_type": "RESPONSE_MIME_TYPE",
"language": "TARGET_LANGUAGE",
"placeholder_to_content": "PLACEHOLDER_TO_CONTENT",
"data_limit": DATA_LIMIT
NUM_INST_OPTIMIZATION_STEPS
: The number of iterations for instruction optimization. Runtime increases with this value. This value must be an integer from 10
to 20
. Default: 10
.NUM_DEMO_OPTIMIZATION_STEPS
: The number of demonstrations to evaluate. This value is used with demonstration
and instruction_and_demo
modes and must be an integer between 2
and the total number of sample prompts - 1. Default: 10
.NUM_DEMO_PER_PROMPT
: The number of demonstrations generated per prompt. This value must be an integer between 3
and 6
. Default: 3
.
TARGET_MODEL_LOCATION
: The location to run the target model in. Default: us-central1
.SOURCE_MODEL
: The Google model previously used with the prompts. The optimizer uses this model to generate ground truth responses.SOURCE_MODEL_LOCATION
: The location to run the source model in. Default: us-central1
.
TARGET_MODEL_QPS
: The QPS to send to the target model. Runtime decreases as this value increases. This value must be a float 3.0
or greater. Default: 3.0
.EVAL_QPS
: The QPS to send to the evaluation service or Cloud Run function. This value must be a float 3.0
or greater. Default: 3.0
.SOURCE_MODEL_QPS
: The QPS to send to the source model. This value must be a float 3.0
or greater. Default: 3.0
.
RESPONSE_MIME_TYPE
: The MIME response type for the target model (text/plain
or application/json
). Default: text/plain
.TARGET_LANGUAGE
: The language of the system instructions. Default: English.PLACEHOLDER_TO_CONTENT
: Information to replace any variables in the system instructions. This content is not optimized.DATA_LIMIT
: The amount of data used for validation. Runtime increases with this value. This value must be an integer between 5
and 100
. Default: 100
.Run the prompt optimizer
Notebook
REST
gs://bucket-name/configuration.json
.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/customJobs
{
"displayName": "JOB_NAME",
"jobSpec": {
"workerPoolSpecs": [
{
"machineSpec": {
"machineType": "n1-standard-4"
},
"replicaCount": 1,
"containerSpec": {
"imageUri": "us-docker.pkg.dev/vertex-ai-restricted/builtin-algorithm/apd:preview_v1_0",
"args": ["--config=PATH_TO_CONFIG""]
}
}
]
}
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/customJobs"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/customJobs" | Select-Object -Expand ContentSDK
LOCATION
: The location where you want to run the optimizer.PROJECT_ID
: Your project ID.PROJECT_NUMBER
: Your project number, available in the Google Cloud console.PATH_TO_CONFIG
: The URI of the configuration file in Cloud Storage (for example, gs://bucket-name/configuration.json
).# Authenticate
from google.colab import auth
auth.authenticate_user(project_id=PROJECT_ID)
# Set the Service Account
SERVICE_ACCOUNT = f"{PROJECT_NUMBER}-compute@developer.gserviceaccount.com"
# Import Vertex AI SDK and Setup
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)
#Create the Vertex AI Client
client = vertexai.Client(project=PROJECT_ID, location=LOCATION)
# Setup the job dictionary
vapo_config = {
'config_path': PATH_TO_CONFIG,
'service_account': SERVICE_ACCOUNT,
'wait_for_completion': True,
}
#Start the Vertex AI Prompt Optimizer
client = client.prompt_optimizer.optimize(method="vapo", config=vapo_config)
Analyze results and iterate
Notebook
gs://bucket-name/output-path
). Console
instruction
or demonstration
). If you used instruction_and_demo
mode, both folders appear. The instruction
folder contains the results for the system instruction optimization. The demonstration
folder contains the results for the few-shot example selection and, if applicable, the final optimized system instructions.
config.json
: The complete configuration used for the job.templates.json
: Each set of system instructions and/or few-shot examples generated by the optimizer, along with their evaluation scores.eval_results.json
: The target model's response and evaluation score for each sample prompt against each generated template.optimized_results.json
: The best-performing system instructions and/or few-shot examples and their final evaluation score.optimized_results.json
file.Best practices
global
region, which is not supported by Vertex AI custom jobs. Therefore, do not use the prompt optimizer for preview models.us-central1
or europe-central2
instead of global
to comply with data residency requirements.What's next
Data-driven prompt optimizer
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-23 UTC.