Class PairwiseMetric (1.75.0)

PairwiseMetric(
    *,
    metric: str,
    metric_prompt_template: typing.Union[
        vertexai.evaluation.metrics.metric_prompt_template.PairwiseMetricPromptTemplate,
        str,
    ],
    baseline_model: typing.Optional[
        typing.Union[
            vertexai.generative_models.GenerativeModel, typing.Callable[[str], str]
        ]
    ] = None
)

A Model-based Pairwise Metric.

A model-based evaluation metric that compares two generative models' responses side-by-side, and allows users to A/B test their generative models to determine which model is performing better.

For more details on when to use pairwise metrics, see Evaluation methods and metrics.

Result Details:

* In `EvalResult.summary_metrics`, win rates for both the baseline and
candidate model are computed. The win rate is computed as proportion of
wins of one model's responses to total attempts as a decimal value
between 0 and 1.

* In `EvalResult.metrics_table`, a pairwise metric produces two
evaluation results per dataset row:
    * `pairwise_choice`: The choice shows whether the candidate model or
      the baseline model performs better, or if they are equally good.
    * `explanation`: The rationale behind each verdict using
      chain-of-thought reasoning. The explanation helps users scrutinize
      the judgment and builds appropriate trust in the decisions.

See [documentation
page](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#understand-results)
for more details on understanding the metric results.

Usage Examples:

```
baseline_model = GenerativeModel("gemini-1.0-pro")
candidate_model = GenerativeModel("gemini-1.5-pro")

pairwise_groundedness = PairwiseMetric(
    metric_prompt_template=MetricPromptTemplateExamples.get_prompt_template(
        "pairwise_groundedness"
    ),
    baseline_model=baseline_model,
)
eval_dataset = pd.DataFrame({
      "prompt"  : [...],
})
pairwise_task = EvalTask(
    dataset=eval_dataset,
    metrics=[pairwise_groundedness],
    experiment="my-pairwise-experiment",
)
pairwise_result = pairwise_task.evaluate(
    model=candidate_model,
    experiment_run_name="gemini-pairwise-eval-run",
)
```

Methods

PairwiseMetric

PairwiseMetric(
    *,
    metric: str,
    metric_prompt_template: typing.Union[
        vertexai.evaluation.metrics.metric_prompt_template.PairwiseMetricPromptTemplate,
        str,
    ],
    baseline_model: typing.Optional[
        typing.Union[
            vertexai.generative_models.GenerativeModel, typing.Callable[[str], str]
        ]
    ] = None
)

Initializes a pairwise evaluation metric.