Details for managed rubric-based metrics

This page provides a full list of managed rubric-based metrics offered by the Gen AI evaluation service, which you can use in the GenAI Client in Vertex AI SDK.

For more information about test-driven evaluation, see Define your evaluation metrics.

Overview

The Gen AI evaluation service offers a list of managed rubric-based metrics for the test-driven evaluation framework:

For metrics with adaptive rubrics, most of them include both the workflow for rubric generation for each prompt and rubric validation. You can run them separately if needed. See Run an evaluation for details.
For metrics with static rubrics, no per-prompt rubrics are generated. For details regarding the intended outputs, see Metric details.

Each managed rubric-based metric has a versioning number. The metric uses the latest version by default, but you can pin to a specific version if needed:

from vertexai import types

text_quality_metric = types.RubricMetric.TEXT_QUALITY
general_quality_v1 = types.RubricMetric.GENERAL_QUALITY(version='v1')

Backward compatibility

For metrics offered as a Metric prompt templates, you can still access the pointwise metrics through the GenAI Client in Vertex AI SDK through the same approach. Pairwise metrics are not supported by the GenAI Client in Vertex AI SDK, but see Run an evaluation to compare two models in the same evaluation.

from vertexai import types

# Access metrics represented by metric prompt template examples
coherence = types.RubricMetric.COHERENCE
fluency = types.RubricMetric.FLUENCY

Managed metrics details

This section lists managed metrics with details such as their type, required inputs, and expected output:

General quality
Text quality
Instruction following
Grounding
Safety
Multi-turn general quality
Multi-turn text quality
Agent final response match
Agent final response reference free

General quality

Latest version	`general_quality_v1`
Type	Adaptive rubrics
Description	A comprehensive adaptive rubrics metric that evaluates the overall quality of a model's response. It automatically generates and assesses a broad range of criteria based on the prompt's content. This is the recommended starting point for most evaluations.
How to access in SDK	`types.RubricMetric.GENERAL_QUALITY`
Input	`prompt` `response` (Optional) `rubric_groups` If you have rubrics already generated, you can provide them directly for evaluation.
Output	`score` `rubrics` and corresponding `verdicts` The score represents the passing rate of the response based on the rubrics.
Number of LLM calls	6 calls to Gemini 2.5 Flash

Text quality

Latest version	`text_quality_v1`
Type	Adaptive rubrics
Description	A targeted adaptive rubrics metric that specifically evaluates the linguistic quality of the response. It assesses aspects like fluency, coherence, and grammar.
How to access in SDK	`types.RubricMetric.TEXT_QUALITY`
Input	`prompt` `response` (Optional) `rubric_groups` If you have rubrics already generated, you can provide them directly for evaluation.
Output	`score` `rubrics` and corresponding `verdicts` The score represents the passing rate of the response based on the rubrics.
Number of LLM calls	6 calls to Gemini 2.5 Flash

Instruction following

Latest version	`instruction_following_v1`
Type	Adaptive rubrics
Description	A targeted adaptive rubrics metric that measures how well the response adheres to the specific constraints and instructions given in the prompt.
How to access in SDK	`types.RubricMetric.INSTRUCTION_FOLLOWING`
Input	`prompt` `response` (Optional) `rubric_groups` If you have rubrics already generated, You can provide them directly for evaluation.
Output	`score` (passing rate) `rubrics` and corresponding `verdicts` The score represents the passing rate of the response based on the rubrics.
Number of LLM calls	6 calls to Gemini 2.5 Flash

Grounding

Latest version	`grounding_v1`
Type	Static rubrics
Description	A score-based metric that checks for factuality and consistency. It verifies that the model's response is grounded based on the context.
How to access in SDK	`types.RubricMetric.GROUNDING`
Input	`prompt` `response` `context`
Output	`score` `explanation` The score has a range of `0-1`, and represents the rate of claims labeled as `supported` or `no_rad` (not requiring factual attributions, such as greetings, questions, or disclaimers) to the input prompt. The explanation contains groupings of sentence, label, reasoning and excerpt from context.
Number of LLM calls	1 call to Gemini 2.5 Flash

Safety

Latest version	`safety_v1`
Type	Static rubrics
Description	A score-based metric that assesses whether the model's response violated one or more of the following policies: PII & Demographic Data Hate Speech Dangerous Content Harassment Sexually Explicit
How to access in SDK	`types.RubricMetric.SAFETY`
Input	`prompt` `response`
Output	`score` `explanation` For the score, `0` is unsafe and `1` is safe. The explanation field includes violated policies.
Number of LLM calls	10 calls to Gemini 2.5 Flash

Multi-turn general quality

Latest version	`multi_turn_general_quality_v1`
Type	Adaptive rubrics
Description	An adaptive rubrics metric that evaluates the overall quality of a model's response within the context of a multi-turn dialogue.
How to access in SDK	`types.RubricMetric.MULTI_TURN_GENERAL_QUALITY`
Input	`prompt` with multi-turn conversations `response` (Optional) `rubric_groups` If you have rubrics already generated, you can provide them directly for evaluation.
Output	`score` rubrics and corresponding verdicts The score represents the passing rate of the response based on the rubrics.
Number of LLM calls	6 calls to Gemini 2.5 Flash

Multi-turn text quality

Latest version	`multi_turn_text_quality_v1`
Type	Adaptive rubrics
Description	An adaptive rubrics metric that evaluates the text quality of a model's response within the context of a multi-turn dialogue.
How to access in SDK	`types.RubricMetric.TEXT_QUALITY`
Input	`prompt` with multi-turn conversations `response` (Optional) `rubric_groups` If you have rubrics already generated, you can provide them directly for evaluation.
Output	`score` `rubrics` and corresponding `verdicts` The score represents the passing rate of the response based on the rubrics.
Number of LLM calls	6 calls to Gemini 2.5 Flash

Agent final response match

Latest version	`final_response_match_v2`
Type	Static rubrics
Description	A metric that evaluates the quality of an AI agent's final answer by comparing it to a provided reference answer (ground truth).
How to access in SDK	`types.RubricMetric.FINAL_RESPONSE_MATCH`
Input	`prompt` `response` `reference`
Output	Score 1: Valid response that matches the reference. 0: Invalid response that does not match the reference. Explanation
Number of LLM calls	5 calls to Gemini 2.5 Flash

Agent final response reference free

Latest version	`final_response_reference_free_v1`
Type	Adaptive rubrics
Description	An adaptive rubrics metric that evaluates the quality of an AI agent's final answer without needing a reference answer. You need to provide rubrics for this metric, as it doesn't support auto-generated rubrics.
How to access in SDK	`types.RubricMetric.FINAL_RESPONSE_REFERENCE_FREE`
Input	`prompt` `response` `rubric_groups`
Output	`score` `rubrics` and corresponding `verdicts` The score represents the passing rate of the response based on the rubrics.
Number of LLM calls	5 calls to Gemini 2.5 Flash

What's next

Prepare your evaluation dataset.