Use checkpoints in supervised fine-tuning for Gemini models

A checkpoint is a snapshot of a model's state at a specific point in the fine-tuning process. You can use intermediate checkpoints in Gemini model fine-tuning to do the following:

  • Save tuning progress.
  • Compare the performance of intermediate checkpoints.
  • Select the best performing checkpoint before overfitting to be the default checkpoint.

For tuning jobs with less than 10 epochs, one checkpoint is saved approximately after each epoch. For tuning jobs with more than 10 epochs, around 10 checkpoints are saved at even distribution, with the exception of the final checkpoint, which is saved immediately after all epochs are trained.

Intermediate checkpoints are deployed to new endpoints sequentially as tuning progresses. The tuned model endpoint represents the endpoint of the default checkpoint, and the tuned model checkpoints include all checkpoints and their corresponding endpoints.

Supported models

The following Gemini models support checkpoints:

  • gemini-2.0-flash-001
  • gemini-2.0-flash-lite-001

For detailed information about Gemini model versions, see Google models and Model versions and lifecycle.

Create a tuning job that exports checkpoints

You can create a supervised fine-tuning job that exports checkpoints by using the Google Gen AI SDK or the Google Cloud console.

Console

To create a tuning job that exports checkpoints, go to the Vertex AI Studio page and select the Tuning tab. For more information, see Tune a model.

Google Gen AI SDK

import time

from google import genai
from google.genai.types import HttpOptions, CreateTuningJobConfig

client = genai.Client(http_options=HttpOptions(api_version="v1"))

tuning_job = client.tunings.tune(
    base_model="gemini-2.0-flash-lite-001",
    training_dataset="gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/text/sft_train_data.jsonl",
    config=CreateTuningJobConfig(
        tuned_model_display_name="Example tuning job",
        # Set to True to disable tuning intermediate checkpoints. Default is False.
        export_last_checkpoint_only=False,
    ),
)

running_states = set([
    "JOB_STATE_PENDING",
    "JOB_STATE_RUNNING",
])

while tuning_job.state in running_states:
    print(tuning_job.state)
    tuning_job = client.tunings.get(name=tuning_job.name)
    time.sleep(60)

print(tuning_job.tuned_model.model)
print(tuning_job.tuned_model.endpoint)
print(tuning_job.experiment)
# Example response:
# projects/123456789012/locations/us-central1/models/1234567890@1
# projects/123456789012/locations/us-central1/endpoints/123456789012345
# projects/123456789012/locations/us-central1/metadataStores/default/contexts/tuning-experiment-2025010112345678

if tuning_job.tuned_model.checkpoints:
    for i, checkpoint in enumerate(tuning_job.tuned_model.checkpoints):
        print(f"Checkpoint {i + 1}: ", checkpoint)
    # Example response:
    # Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'
    # Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

List the checkpoints for a tuning job

You can view the checkpoints for your completed tuning job in the Google Cloud console or list them by using the Google Gen AI SDK.

If intermediate checkpoints are disabled, only the final checkpoint is displayed or returned.

Console

  1. To locate your tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tuning tab, find your model and click Monitor.

    The tuning metrics and checkpoints of your model are shown. In each metrics graph, checkpoint numbers are displayed as annotations as follows:

    • For each epoch, you see a step number and an epoch number.
    • The step number is the exact step when a checkpoint is saved.
    • The epoch number is an estimated epoch number that the checkpoint belongs to, except for the final checkpoint for a completed tuning job, which has the exact epoch number.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=name)

if tuning_job.tuned_model.checkpoints:
    for i, checkpoint in enumerate(tuning_job.tuned_model.checkpoints):
        print(f"Checkpoint {i + 1}: ", checkpoint)
# Example response:
# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'
# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

View model details and checkpoints

You can view the your tuned model in the Google Cloud console or use the Google Gen AI SDK to get model details and checkpoints.

Console

You can view your tuned model in the Vertex AI Model Registry in the Online prediction Endpoints page.

  1. Go to the Model Registry page from the Vertex AI section in the Google Cloud console.

    Go to the Model Registry page

  2. Click the name of your model.

    The default version of your model appears.

  3. Click the Version details tab to see information about your model version.

    Note that the Objective is Large model, the Model type is Foundation, and the Source is Vertex AI Studio tuning.

  4. Click the Deploy & test tab to see the endpoint where the model is deployed.

  5. Click the endpoint name to go to the Endpoint page to see the list of checkpoints that are deployed to the endpoint. For each checkpoint, the model version ID and checkpoint ID are displayed.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=name)
tuned_model = client.models.get(model=tuning_job.tuned_model.model)
print(tuned_model)
# Example response:
# Model(name='projects/123456789012/locations/us-central1/models/1234567890@1', ...)

print(f"Default checkpoint: {tuned_model.default_checkpoint_id}")
# Example response:
# Default checkpoint: 2

if tuned_model.checkpoints:
    for _, checkpoint in enumerate(tuned_model.checkpoints):
        print(f"Checkpoint {checkpoint.checkpoint_id}: ", checkpoint)
# Example response:
# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10
# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20

Test the checkpoints

You can view a list of checkpoints in the Vertex AI Model Registry and test each one. Or you can use the Google Gen AI SDK to list and test your checkpoints.

Console

  1. To locate your tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tuning tab, find your model and click Monitor.

  3. In the checkpoint table in the Monitor pane, next to the desired checkpoint, click the Test link.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=name)

contents = "Why is the sky blue?"

# Tests the default checkpoint
response = client.models.generate_content(
    model=tuning_job.tuned_model.endpoint,
    contents=contents,
)
print(response.text)

# Tests Checkpoint 1
checkpoint1_response = client.models.generate_content(
    model=tuning_job.tuned_model.checkpoints[0].endpoint,
    contents=contents,
)
print(checkpoint1_response.text)

# Tests Checkpoint 2
checkpoint2_response = client.models.generate_content(
    model=tuning_job.tuned_model.checkpoints[1].endpoint,
    contents=contents,
)
print(checkpoint2_response.text)

Select a new default checkpoint

You can use the default checkpoint to represent the best performing checkpoint. By default, the default checkpoint is the final checkpoint of a tuning job.

When deploying a model with checkpoints, the default checkpoint is deployed.

When copying a model with checkpoints, the destination model would have the same default checkpoint ID as the source model. All checkpoints are copied, so you can select a new default checkpoint for the destination model.

The tuning job endpoint will be updated if you update a default checkpoint, and you can use the new endpoint for prediction.

Console

  1. To locate your tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tuning tab, find your model and click Monitor.

  3. In the checkpoint table in the Monitor pane, next to the desired checkpoint, click Actions and select Set as default.

  4. Click Confirm.

    The metrics graphs and checkpoint table are updated to show the new default checkpoint. The endpoint in the TuningJob details page is updated to show the Endpoint of the new default checkpoint.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions, UpdateModelConfig

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=name)
tuned_model = client.models.get(model=tuning_job.tuned_model.model)

print(f"Default checkpoint: {tuned_model.default_checkpoint_id}")
print(f"Tuned model endpoint: {tuning_job.tuned_model.endpoint}")
# Example response:
# Default checkpoint: 2
# projects/123456789012/locations/us-central1/endpoints/123456789012345

# Set a new default checkpoint.
# Eg. checkpoint_id = "1"
tuned_model = client.models.update(
    model=tuned_model.name,
    config=UpdateModelConfig(default_checkpoint_id=checkpoint_id),
)

print(f"Default checkpoint: {tuned_model.default_checkpoint_id}")
print(f"Tuned model endpoint: {tuning_job.tuned_model.endpoint}")
# Example response:
# Default checkpoint: 1
# projects/123456789012/locations/us-central1/endpoints/123456789000000

What's next