Tune Gemini Pro models by using supervised tuning

This document describes how to tune a Gemini model by using supervised tuning.

Before you begin

Create a tuning job

You can create a supervised text model tuning job by using the Google Cloud console, API, or the Vertex AI SDK for Python. For guidance on model tuning configurations, see Recommended configurations.

REST

To create a model tuning job, send a POST request by using the tuningJobs.create method. Note that some of the parameters are not supported by all of the models. Ensure that you only include the applicable parameters for the model that you're tuning.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • BASE_MODEL: Name of the foundation model to tune. Supported values: gemini-1.0-pro-002.
  • TRAINING_DATASET_URI: Cloud Storage URI of your training dataset. The dataset must be formatted as a JSONL file. For best results, provide at least 100 to 500 examples. For more information, see About supervised tuning datasets .
  • VALIDATION_DATASET_URIOptional: The Cloud Storage URI of your validation dataset file.
  • EPOCH_COUNTOptional: The number of epochs to train. Leave it unset to use the recommeded value.
  • LEARNING_RATE_MULTIPLIER: Optional: A multiplier to apply to the recommended learning rate. Leave it unset to use the recommended value.
  • TUNED_MODEL_DISPLAYNAMEOptional: A display name for the tuned model. If not set, a random name is generated.

HTTP method and URL:

POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs

Request JSON body:

{
  "baseModel": "BASE_MODEL",
  "supervisedTuningSpec" : {
      "training_dataset_uri": "TRAINING_DATASET_URI",
      "validation_dataset_uri": "VALIDATION_DATASET_URI",
      "hyper_parameters": {
          "epoch_count": EPOCH_COUNT,
          "learning_rate_multiplier": LEARNING_RATE_MULTIPLIER
      },
  },
  "tunedModelDisplayName": "TUNED_MODEL_DISPLAYNAME,
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Python


import time

import vertexai
from vertexai.preview.tuning import sft

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"

vertexai.init(project=project_id, location="us-central1")

sft_tuning_job = sft.train(
    source_model="gemini-1.0-pro-002",
    train_dataset="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
    # The following parameters are optional
    validation_dataset="gs://cloud-samples-data/ai-platform/generative_ai/sft_validation_data.jsonl",
    epochs=4,
    learning_rate_multiplier=1.0,
    tuned_model_display_name="tuned_gemini_pro",
)

# Polling for job completion
while not sft_tuning_job.has_ended:
    time.sleep(60)
    sft_tuning_job.refresh()

print(sft_tuning_job.tuned_model_name)
print(sft_tuning_job.tuned_model_endpoint_name)
print(sft_tuning_job.experiment)

Console

To tune a text model with supervised tuning by using the Google Cloud console, perform the following steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Click the Tune and distill tab.

  3. Click Create tuned model.

  4. Under Tuning method, select the radio button for Supervised tuning.

  5. Under Model details, configure the following:

    1. In the Tuned model name field, enter a name for your new tuned model, up to 128 characters.
    2. In the Base model field, select gemini-1.0-pro-002.
    3. In the Region drop-down field, Select the region where the pipeline tuning job runs and where the tuned model is deployed.
  6. Optional: expand the Advanced Options drop down arrow and configure the following:

    1. In the Number of epochs field, enter the number of steps to run for model tuning.
    2. In the Learning rate multiplier field, enter a Enter the step size at each iteration. The default value is 1.
  7. Click Continue.

    The Tuning dataset page opens.

  8. To upload a dataset file, select one of the following:

    1. If you haven't upload a dataset yet, select the radio button for Upload file to Cloud Storage.
    2. In the Select JSONL file field, click Browse and select your dataset file.
    3. In the Dataset location field, click Browse and select the Cloud Storage bucket where you want to store your dataset file.
    4. If your dataset file is already in a Cloud Storage bucket, select the radio button for Existing file on Cloud Storage.
    5. In Cloud Storage file path field, click Browse and select the Cloud Storage bucket where your dataset file is located.
  9. (Optional) To get validation metrics during training, click the Enable model validation toggle.

    1. In the Validation dataset file, enter the Cloud Storage path of your validation dataset.
  10. Click Start Tuning.

    Your new model appears under the Gemini Pro tuned models section on the Tune and Distill page. When the model is finished tuning, the Status says Succeeded.

Example curl command

PROJECT_ID=myproject
LOCATION=us-central1

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/tuningJobs" \
-d \
$'{
    "baseModel": "gemini-1.0-pro-002",
    "supervisedTuningSpec" : {
        "training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
        "validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_validation_data.jsonl"
    },
    "tunedModelDisplayName": "tuned_gemini_pro"
}'

The following table shows the recommended configurations for tuning a foundation model by task:

Task No. of examples in dataset No. of Epochs
Classification 500+ 2-4
Summarization 1000+ 2-4
Extractive QA 500+ 2-4
Chat 1000+ 2-4

For the number of epochs, we suggest that you try more than one value to get the best performance on a particular dataset. Increasing the number of epochs might provide better results. However, you should be wary of overfitting specially when your dataset is small and lower the epoch number if that happens.

For the classification tasks, complexity of the task increases with the number of classes. For large number of classes you might need bigger datasets.

View a list of tuning jobs

You can view a list of tuning jobs in your current project by using the Google Cloud console, the Vertex AI SDK for Python, or by sending a GET request by using the tuningJobs method.

REST

To view a list of model tuning jobs, send a GET request by using the tuningJobs.list method.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.

HTTP method and URL:

GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Python

import vertexai
from vertexai.preview.tuning import sft

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"

vertexai.init(project=project_id, location="us-central1")

responses = sft.SupervisedTuningJob.list()

for response in responses:
    print(response)

Console

To view your tuning jobs in the Google Cloud console, go to the Vertex AI Studio page.

Go to Vertex AI Studio

Your Gemini tuning jobs are listed in the table under the Gemini Pro tuned models section.

Get details of a tuning job

You can get the details of a tuning job in your current project by using the Google Cloud console, the Vertex AI SDK for Python, or by sending a GET request by using the tuningJobs method.

REST

To view a list of model tuning jobs, send a GET request by using the tuningJobs.get method and specify the TuningJob_ID.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • TUNING_JOB_ID: The ID of the tuning job.

HTTP method and URL:

GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Python

import vertexai
from vertexai.preview.tuning import sft

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# tuning_job_id = "TUNING_JOB_ID"

vertexai.init(project=project_id, location="us-central1")

response = sft.SupervisedTuningJob(
    f"projects/{project_id}/locations/{location}/tuningJobs/{tuning_job_id}"
)

print(response)

Console

  1. To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Gemini Pro tuned models table, find your model and click Details.

    The details of your model are shown.

Cancel a tuning job

You can cancel a tuning job in your current project by using the Google Cloud console, the Vertex AI SDK for Python, or by sending a POST request using the tuningJobs method.

REST

To view a list of model tuning jobs, send a GET request by using the tuningJobs.cancel method and specify the TuningJob_ID.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • TUNING_JOB_ID: The ID of the tuning job.

HTTP method and URL:

POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel

To send your request, choose one of these options:

curl

Execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Python

import vertexai
from vertexai.preview.tuning import sft

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
# tuning_job_id = "TUNING_JOB_ID"

vertexai.init(project=project_id, location="us-central1")

job = sft.SupervisedTuningJob(
    f"projects/{project_id}/locations/{location}/tuningJobs/{tuning_job_id}"
)
job.cancel()

Console

  1. To cancel a tuning job in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Gemini Pro tuned models table, click Manage run.

  3. Click Cancel.

Make a prediction with the tuned model

REST

To get a prediction with the tuned model from a tuning job, send a POST request and specify the TUNED_ENDPOINT_ID.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • ENDPOINT_ID: The tuned model endpoint ID from the GET API.
  • TEMPERATURE: The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

    If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.

  • TOP_P: Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

    Specify a lower value for less random responses and a higher value for more random responses.

  • TOP_K: Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

    For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

    Specify a lower value for less random responses and a higher value for more random responses.

  • MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

    Specify a lower value for shorter responses and a higher value for potentially longer responses.

HTTP method and URL:

POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent

Request JSON body:

{
    "contents": [
        {
            "role": "USER",
            "parts": {
                "text" : "Why is sky blue?"
            }
        }
    ],
    "generation_config": {
        "temperature":TEMPERATURE,
        "top_p": TOP_P,
        "top_k": TOP_K,
        "max_output_tokens": MAX_OUTPUT_TOKENS
    }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Python

from vertexai.preview.generative_models import GenerativeModel
from vertexai.preview import tuning
from vertexai.preview.tuning import sft

sft_tuning_job = sft.SupervisedTuningJob("projects/<PROJECT_ID>/locations/<TUNING_JOB_REGION>/tuningJobs/<TUNING_JOB_ID>")
tuned_model = GenerativeModel(sft_tuning_job.tuned_model_endpoint_name)
print(tuned_model.generate_content(content))

Console

  1. To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Gemini Pro tuned models table, select Test.

    It opens up a page where you can create a conversation with your tuned model.

Tuning and validation metrics

You can configure a model tuning job to collect and report model tuning and model evaluation metrics, which can then be visualized on the Vertex AI Studio page.

Model tuning metrics

The model tuning job automatically collects the following tuning metrics for gemini-1.0-pro-002.

  • /train_total_loss: Loss for the tuning dataset at a training step.
  • /train_fraction_of_correct_next_step_preds: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset.
  • /train_num_predictions: Number of predicted tokens at a training step.

Model validation metrics:

You can configure a model tuning job to collect the following validation metrics for gemini-1.0-pro-002.

  • /eval_total_loss: Loss for the validation dataset at a validation step.
  • /eval_fraction_of_correct_next_step_preds: The token accuracy at an validation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the validation dataset.
  • /eval_num_predictions: Number of predicted tokens at a validation step.

The metrics visualizations are available after the model tuning job completes. If you don't specify a validation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.

What's next