About preference tuning for Gemini models

Vertex AI preference tuning lets you tune your Gemini models with human feedback data.

Preference tuning enables the model to learn from subjective user preferences that are hard to define by using specific labels or through supervised fine-tuning alone.

The preference tuning input dataset contains examples consisting of a prompt and pair of responses indicating which one is preferred and which one is dispreferred. The model learns to generate preferred responses with higher probability and dispreferred responses with lower probability.

To learn how to prepare the dataset, see Prepare preference tuning data for Gemini models.

Supported models

The following Gemini models support preference tuning:

Limitations

Specification Value
Modalities Text
File size 10GB
Maximum input and output training tokens 131,072
Maximum input and output serving tokens Same as base Gemini model
Maximum training dataset size 1M text-only examples

Best practices

Before you apply the preference optimization algorithm to your model, we strongly recommend that you do the following:

  1. Tune the model using supervised fine-tuning on the preferred response data. This teaches the model to generate preferred responses during inference.
  2. Continue tuning from the checkpoint produced from step 1 using preference tuning. This teaches the model to increase the likelihood gap between preferred and dispreferred responses.

For creating the supervised fine-tuning dataset, use the prompt and accepted response pairs in your preference dataset as prompt and target for your supervised fine-tuning dataset. Typically one or two epochs of supervised fine-tuning should be sufficient, although this can change based on the dataset size and how aligned your training dataset is with the Gemini model initially.

To use supervised fine-tuning to tune the model, follow the steps in Tune Gemini models by using supervised fine-tuning.

Quota

Quota is enforced on the number of concurrent tuning jobs. Every project comes with a default quota to run at least one tuning job. This is a global quota, shared across all available regions and supported models. If you want to run more jobs concurrently, you need to request additional quota for Global concurrent tuning jobs.

Pricing

Pricing for Gemini preference tuning can be found here: Vertex AI pricing.

For pricing purposes, the number of tokens for each tuning example is calculated by multiplying the number of tokens in the prompt by 2, and then adding the number of completion tokens.

What's next