A checkpoint is a snapshot of a model's state at a specific point in the fine-tuning process. You can use intermediate checkpoints in Gemini model fine-tuning to do the following:
- Save tuning progress.
- Compare the performance of intermediate checkpoints.
- Select the best performing checkpoint before overfitting to be the default checkpoint.
For tuning jobs with less than 10 epochs, one checkpoint is saved approximately after each epoch. For tuning jobs with more than 10 epochs, around 10 checkpoints are saved at even distribution, with the exception of the final checkpoint, which is saved immediately after all epochs are trained.
Intermediate checkpoints are deployed to new endpoints sequentially as tuning progresses. The tuned model endpoint represents the endpoint of the default checkpoint, and the tuned model checkpoints include all checkpoints and their corresponding endpoints.
Supported models
The following Gemini models support checkpoints:
gemini-2.0-flash-001
gemini-2.0-flash-lite-001
For detailed information about Gemini model versions, see Google models and Model versions and lifecycle.
Create a tuning job that exports checkpoints
You can create a supervised fine-tuning job that exports checkpoints by using the Google Gen AI SDK or the Google Cloud console.
Console
To create a tuning job that exports checkpoints, go to the Vertex AI Studio page and select the Tuning tab. For more information, see Tune a model.
Google Gen AI SDK
List the checkpoints for a tuning job
You can view the checkpoints for your completed tuning job in the Google Cloud console or list them by using the Google Gen AI SDK.
If intermediate checkpoints are disabled, only the final checkpoint is displayed or returned.
Console
To locate your tuned model in the Google Cloud console, go to the Vertex AI Studio page.
In the Tuning tab, find your model and click Monitor.
The tuning metrics and checkpoints of your model are shown. In each metrics graph, checkpoint numbers are displayed as annotations as follows:
- For each epoch, you see a step number and an epoch number.
- The step number is the exact step when a checkpoint is saved.
- The epoch number is an estimated epoch number that the checkpoint belongs to, except for the final checkpoint for a completed tuning job, which has the exact epoch number.
Google Gen AI SDK
View model details and checkpoints
You can view the your tuned model in the Google Cloud console or use the Google Gen AI SDK to get model details and checkpoints.
Console
You can view your tuned model in the Vertex AI Model Registry in the Online prediction Endpoints page.
Go to the Model Registry page from the Vertex AI section in the Google Cloud console.
Click the name of your model.
The default version of your model appears.
Click the Version details tab to see information about your model version.
Note that the Objective is
Large model
, the Model type isFoundation
, and the Source isVertex AI Studio tuning
.Click the Deploy & test tab to see the endpoint where the model is deployed.
Click the endpoint name to go to the Endpoint page to see the list of checkpoints that are deployed to the endpoint. For each checkpoint, the model version ID and checkpoint ID are displayed.
Google Gen AI SDK
Test the checkpoints
You can view a list of checkpoints in the Vertex AI Model Registry and test each one. Or you can use the Google Gen AI SDK to list and test your checkpoints.
Console
To locate your tuned model in the Google Cloud console, go to the Vertex AI Studio page.
In the Tuning tab, find your model and click Monitor.
In the checkpoint table in the Monitor pane, next to the desired checkpoint, click the Test link.
Google Gen AI SDK
Select a new default checkpoint
You can use the default checkpoint to represent the best performing checkpoint. By default, the default checkpoint is the final checkpoint of a tuning job.
When deploying a model with checkpoints, the default checkpoint is deployed.
When copying a model with checkpoints, the destination model would have the same default checkpoint ID as the source model. All checkpoints are copied, so you can select a new default checkpoint for the destination model.
The tuning job endpoint will be updated if you update a default checkpoint, and you can use the new endpoint for prediction.
Console
To locate your tuned model in the Google Cloud console, go to the Vertex AI Studio page.
In the Tuning tab, find your model and click Monitor.
In the checkpoint table in the Monitor pane, next to the desired checkpoint, click
Actions and select Set as default.Click Confirm.
The metrics graphs and checkpoint table are updated to show the new default checkpoint. The endpoint in the TuningJob details page is updated to show the Endpoint of the new default checkpoint.
Google Gen AI SDK
What's next
- Learn more about supervised fine-tuning for Gemini models.