Method: projects.locations.evaluateInstances

Evaluates instances based on a given metric.

HTTP request

POST https://{service-endpoint}/v1beta1/{location}:evaluateInstances

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

Parameters
location

string

Required. The resource name of the Location to evaluate the instances. Format: projects/{project}/locations/{location}

Request body

The request body contains data with the following structure:

JSON representation
{

  // Union field metric_inputs can be only one of the following:
  "exactMatchInput": {
    object (ExactMatchInput)
  },
  "bleuInput": {
    object (BleuInput)
  },
  "rougeInput": {
    object (RougeInput)
  },
  "fluencyInput": {
    object (FluencyInput)
  },
  "coherenceInput": {
    object (CoherenceInput)
  },
  "safetyInput": {
    object (SafetyInput)
  },
  "groundednessInput": {
    object (GroundednessInput)
  },
  "fulfillmentInput": {
    object (FulfillmentInput)
  },
  "summarizationQualityInput": {
    object (SummarizationQualityInput)
  },
  "pairwiseSummarizationQualityInput": {
    object (PairwiseSummarizationQualityInput)
  },
  "summarizationHelpfulnessInput": {
    object (SummarizationHelpfulnessInput)
  },
  "summarizationVerbosityInput": {
    object (SummarizationVerbosityInput)
  },
  "questionAnsweringQualityInput": {
    object (QuestionAnsweringQualityInput)
  },
  "pairwiseQuestionAnsweringQualityInput": {
    object (PairwiseQuestionAnsweringQualityInput)
  },
  "questionAnsweringRelevanceInput": {
    object (QuestionAnsweringRelevanceInput)
  },
  "questionAnsweringHelpfulnessInput": {
    object (QuestionAnsweringHelpfulnessInput)
  },
  "questionAnsweringCorrectnessInput": {
    object (QuestionAnsweringCorrectnessInput)
  },
  "toolCallValidInput": {
    object (ToolCallValidInput)
  },
  "toolNameMatchInput": {
    object (ToolNameMatchInput)
  },
  "toolParameterKeyMatchInput": {
    object (ToolParameterKeyMatchInput)
  },
  "toolParameterKvMatchInput": {
    object (ToolParameterKVMatchInput)
  }
  // End of list of possible types for union field metric_inputs.
}
Fields
Union field metric_inputs. Instances and specs for evaluation metric_inputs can be only one of the following:
exactMatchInput

object (ExactMatchInput)

Auto metric instances. Instances and metric spec for exact match metric.

bleuInput

object (BleuInput)

Instances and metric spec for bleu metric.

rougeInput

object (RougeInput)

Instances and metric spec for rouge metric.

fluencyInput

object (FluencyInput)

LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric.

coherenceInput

object (CoherenceInput)

Input for coherence metric.

safetyInput

object (SafetyInput)

Input for safety metric.

groundednessInput

object (GroundednessInput)

Input for groundedness metric.

fulfillmentInput

object (FulfillmentInput)

Input for fulfillment metric.

summarizationQualityInput

object (SummarizationQualityInput)

Input for summarization quality metric.

pairwiseSummarizationQualityInput

object (PairwiseSummarizationQualityInput)

Input for pairwise summarization quality metric.

summarizationHelpfulnessInput

object (SummarizationHelpfulnessInput)

Input for summarization helpfulness metric.

summarizationVerbosityInput

object (SummarizationVerbosityInput)

Input for summarization verbosity metric.

questionAnsweringQualityInput

object (QuestionAnsweringQualityInput)

Input for question answering quality metric.

pairwiseQuestionAnsweringQualityInput

object (PairwiseQuestionAnsweringQualityInput)

Input for pairwise question answering quality metric.

questionAnsweringRelevanceInput

object (QuestionAnsweringRelevanceInput)

Input for question answering relevance metric.

questionAnsweringHelpfulnessInput

object (QuestionAnsweringHelpfulnessInput)

Input for question answering helpfulness metric.

questionAnsweringCorrectnessInput

object (QuestionAnsweringCorrectnessInput)

Input for question answering correctness metric.

toolCallValidInput

object (ToolCallValidInput)

Tool call metric instances. Input for tool call valid metric.

toolNameMatchInput

object (ToolNameMatchInput)

Input for tool name match metric.

toolParameterKeyMatchInput

object (ToolParameterKeyMatchInput)

Input for tool parameter key match metric.

toolParameterKvMatchInput

object (ToolParameterKVMatchInput)

Input for tool parameter key value match metric.

Response body

Response message for EvaluationService.EvaluateInstances.

If successful, the response body contains data with the following structure:

JSON representation
{

  // Union field evaluation_results can be only one of the following:
  "exactMatchResults": {
    object (ExactMatchResults)
  },
  "bleuResults": {
    object (BleuResults)
  },
  "rougeResults": {
    object (RougeResults)
  },
  "fluencyResult": {
    object (FluencyResult)
  },
  "coherenceResult": {
    object (CoherenceResult)
  },
  "safetyResult": {
    object (SafetyResult)
  },
  "groundednessResult": {
    object (GroundednessResult)
  },
  "fulfillmentResult": {
    object (FulfillmentResult)
  },
  "summarizationQualityResult": {
    object (SummarizationQualityResult)
  },
  "pairwiseSummarizationQualityResult": {
    object (PairwiseSummarizationQualityResult)
  },
  "summarizationHelpfulnessResult": {
    object (SummarizationHelpfulnessResult)
  },
  "summarizationVerbosityResult": {
    object (SummarizationVerbosityResult)
  },
  "questionAnsweringQualityResult": {
    object (QuestionAnsweringQualityResult)
  },
  "pairwiseQuestionAnsweringQualityResult": {
    object (PairwiseQuestionAnsweringQualityResult)
  },
  "questionAnsweringRelevanceResult": {
    object (QuestionAnsweringRelevanceResult)
  },
  "questionAnsweringHelpfulnessResult": {
    object (QuestionAnsweringHelpfulnessResult)
  },
  "questionAnsweringCorrectnessResult": {
    object (QuestionAnsweringCorrectnessResult)
  },
  "toolCallValidResults": {
    object (ToolCallValidResults)
  },
  "toolNameMatchResults": {
    object (ToolNameMatchResults)
  },
  "toolParameterKeyMatchResults": {
    object (ToolParameterKeyMatchResults)
  },
  "toolParameterKvMatchResults": {
    object (ToolParameterKVMatchResults)
  }
  // End of list of possible types for union field evaluation_results.
}
Fields
Union field evaluation_results. Evaluation results will be served in the same order as presented in EvaluationRequest.instances. evaluation_results can be only one of the following:
exactMatchResults

object (ExactMatchResults)

Auto metric evaluation results. Results for exact match metric.

bleuResults

object (BleuResults)

Results for bleu metric.

rougeResults

object (RougeResults)

Results for rouge metric.

fluencyResult

object (FluencyResult)

LLM-based metric evaluation result. General text generation metrics, applicable to other categories. result for fluency metric.

coherenceResult

object (CoherenceResult)

result for coherence metric.

safetyResult

object (SafetyResult)

result for safety metric.

groundednessResult

object (GroundednessResult)

result for groundedness metric.

fulfillmentResult

object (FulfillmentResult)

result for fulfillment metric.

summarizationQualityResult

object (SummarizationQualityResult)

Summarization only metrics. result for summarization quality metric.

pairwiseSummarizationQualityResult

object (PairwiseSummarizationQualityResult)

result for pairwise summarization quality metric.

summarizationHelpfulnessResult

object (SummarizationHelpfulnessResult)

result for summarization helpfulness metric.

summarizationVerbosityResult

object (SummarizationVerbosityResult)

result for summarization verbosity metric.

questionAnsweringQualityResult

object (QuestionAnsweringQualityResult)

Question answering only metrics. result for question answering quality metric.

pairwiseQuestionAnsweringQualityResult

object (PairwiseQuestionAnsweringQualityResult)

result for pairwise question answering quality metric.

questionAnsweringRelevanceResult

object (QuestionAnsweringRelevanceResult)

result for question answering relevance metric.

questionAnsweringHelpfulnessResult

object (QuestionAnsweringHelpfulnessResult)

result for question answering helpfulness metric.

questionAnsweringCorrectnessResult

object (QuestionAnsweringCorrectnessResult)

result for question answering correctness metric.

toolCallValidResults

object (ToolCallValidResults)

Tool call metrics. Results for tool call valid metric.

toolNameMatchResults

object (ToolNameMatchResults)

Results for tool name match metric.

toolParameterKeyMatchResults

object (ToolParameterKeyMatchResults)

Results for tool parameter key match metric.

toolParameterKvMatchResults

object (ToolParameterKVMatchResults)

Results for tool parameter key value match metric.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ExactMatchInput

Input for exact match metric.

JSON representation
{
  "metricSpec": {
    object (ExactMatchSpec)
  },
  "instances": [
    {
      object (ExactMatchInstance)
    }
  ]
}
Fields
metricSpec

object (ExactMatchSpec)

Required. Spec for exact match metric.

instances[]

object (ExactMatchInstance)

Required. Repeated exact match instances.

ExactMatchSpec

This type has no fields.

Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.

ExactMatchInstance

Spec for exact match instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

BleuInput

Input for bleu metric.

JSON representation
{
  "metricSpec": {
    object (BleuSpec)
  },
  "instances": [
    {
      object (BleuInstance)
    }
  ]
}
Fields
metricSpec

object (BleuSpec)

Required. Spec for bleu score metric.

instances[]

object (BleuInstance)

Required. Repeated bleu instances.

BleuSpec

Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.

JSON representation
{
  "useEffectiveOrder": boolean
}
Fields
useEffectiveOrder

boolean

Optional. Whether to useEffectiveOrder to compute bleu score.

BleuInstance

Spec for bleu instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

RougeInput

Input for rouge metric.

JSON representation
{
  "metricSpec": {
    object (RougeSpec)
  },
  "instances": [
    {
      object (RougeInstance)
    }
  ]
}
Fields
metricSpec

object (RougeSpec)

Required. Spec for rouge score metric.

instances[]

object (RougeInstance)

Required. Repeated rouge instances.

RougeSpec

Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.

JSON representation
{
  "rougeType": string,
  "useStemmer": boolean,
  "splitSummaries": boolean
}
Fields
rougeType

string

Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.

useStemmer

boolean

Optional. Whether to use stemmer to compute rouge score.

splitSummaries

boolean

Optional. Whether to split summaries while using rougeLsum.

RougeInstance

Spec for rouge instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

FluencyInput

Input for fluency metric.

JSON representation
{
  "metricSpec": {
    object (FluencySpec)
  },
  "instance": {
    object (FluencyInstance)
  }
}
Fields
metricSpec

object (FluencySpec)

Required. Spec for fluency score metric.

instance

object (FluencyInstance)

Required. Fluency instance.

FluencySpec

Spec for fluency score metric.

JSON representation
{
  "version": integer
}
Fields
version

integer

Optional. Which version to use for evaluation.

FluencyInstance

Spec for fluency instance.

JSON representation
{
  "prediction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

CoherenceInput

Input for coherence metric.

JSON representation
{
  "metricSpec": {
    object (CoherenceSpec)
  },
  "instance": {
    object (CoherenceInstance)
  }
}
Fields
metricSpec

object (CoherenceSpec)

Required. Spec for coherence score metric.

instance

object (CoherenceInstance)

Required. Coherence instance.

CoherenceSpec

Spec for coherence score metric.

JSON representation
{
  "version": integer
}
Fields
version

integer

Optional. Which version to use for evaluation.

CoherenceInstance

Spec for coherence instance.

JSON representation
{
  "prediction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

SafetyInput

Input for safety metric.

JSON representation
{
  "metricSpec": {
    object (SafetySpec)
  },
  "instance": {
    object (SafetyInstance)
  }
}
Fields
metricSpec

object (SafetySpec)

Required. Spec for safety metric.

instance

object (SafetyInstance)

Required. Safety instance.

SafetySpec

Spec for safety metric.

JSON representation
{
  "version": integer
}
Fields
version

integer

Optional. Which version to use for evaluation.

SafetyInstance

Spec for safety instance.

JSON representation
{
  "prediction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

GroundednessInput

Input for groundedness metric.

JSON representation
{
  "metricSpec": {
    object (GroundednessSpec)
  },
  "instance": {
    object (GroundednessInstance)
  }
}
Fields
metricSpec

object (GroundednessSpec)

Required. Spec for groundedness metric.

instance

object (GroundednessInstance)

Required. Groundedness instance.

GroundednessSpec

Spec for groundedness metric.

JSON representation
{
  "version": integer
}
Fields
version

integer

Optional. Which version to use for evaluation.

GroundednessInstance

Spec for groundedness instance.

JSON representation
{
  "prediction": string,
  "context": string
}
Fields
prediction

string

Required. Output of the evaluated model.

context

string

Required. Background information provided in context used to compare against the prediction.

FulfillmentInput

Input for fulfillment metric.

JSON representation
{
  "metricSpec": {
    object (FulfillmentSpec)
  },
  "instance": {
    object (FulfillmentInstance)
  }
}
Fields
metricSpec

object (FulfillmentSpec)

Required. Spec for fulfillment score metric.

instance

object (FulfillmentInstance)

Required. Fulfillment instance.

FulfillmentSpec

Spec for fulfillment metric.

JSON representation
{
  "version": integer
}
Fields
version

integer

Optional. Which version to use for evaluation.

FulfillmentInstance

Spec for fulfillment instance.

JSON representation
{
  "prediction": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

instruction

string

Required. Inference instruction prompt to compare prediction with.

SummarizationQualityInput

Input for summarization quality metric.

JSON representation
{
  "metricSpec": {
    object (SummarizationQualitySpec)
  },
  "instance": {
    object (SummarizationQualityInstance)
  }
}
Fields
metricSpec

object (SummarizationQualitySpec)

Required. Spec for summarization quality score metric.

instance

object (SummarizationQualityInstance)

Required. Summarization quality instance.

SummarizationQualitySpec

Spec for summarization quality score metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute summarization quality.

version

integer

Optional. Which version to use for evaluation.

SummarizationQualityInstance

Spec for summarization quality instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Required. Text to be summarized.

instruction

string

Required. Summarization prompt for LLM.

PairwiseSummarizationQualityInput

Input for pairwise summarization quality metric.

JSON representation
{
  "metricSpec": {
    object (PairwiseSummarizationQualitySpec)
  },
  "instance": {
    object (PairwiseSummarizationQualityInstance)
  }
}
Fields
metricSpec

object (PairwiseSummarizationQualitySpec)

Required. Spec for pairwise summarization quality score metric.

instance

object (PairwiseSummarizationQualityInstance)

Required. Pairwise summarization quality instance.

PairwiseSummarizationQualitySpec

Spec for pairwise summarization quality score metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute pairwise summarization quality.

version

integer

Optional. Which version to use for evaluation.

PairwiseSummarizationQualityInstance

Spec for pairwise summarization quality instance.

JSON representation
{
  "prediction": string,
  "baselinePrediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the candidate model.

baselinePrediction

string

Required. Output of the baseline model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Required. Text to be summarized.

instruction

string

Required. Summarization prompt for LLM.

SummarizationHelpfulnessInput

Input for summarization helpfulness metric.

JSON representation
{
  "metricSpec": {
    object (SummarizationHelpfulnessSpec)
  },
  "instance": {
    object (SummarizationHelpfulnessInstance)
  }
}
Fields
metricSpec

object (SummarizationHelpfulnessSpec)

Required. Spec for summarization helpfulness score metric.

instance

object (SummarizationHelpfulnessInstance)

Required. Summarization helpfulness instance.

SummarizationHelpfulnessSpec

Spec for summarization helpfulness score metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute summarization helpfulness.

version

integer

Optional. Which version to use for evaluation.

SummarizationHelpfulnessInstance

Spec for summarization helpfulness instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Required. Text to be summarized.

instruction

string

Optional. Summarization prompt for LLM.

SummarizationVerbosityInput

Input for summarization verbosity metric.

JSON representation
{
  "metricSpec": {
    object (SummarizationVerbositySpec)
  },
  "instance": {
    object (SummarizationVerbosityInstance)
  }
}
Fields
metricSpec

object (SummarizationVerbositySpec)

Required. Spec for summarization verbosity score metric.

instance

object (SummarizationVerbosityInstance)

Required. Summarization verbosity instance.

SummarizationVerbositySpec

Spec for summarization verbosity score metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute summarization verbosity.

version

integer

Optional. Which version to use for evaluation.

SummarizationVerbosityInstance

Spec for summarization verbosity instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Required. Text to be summarized.

instruction

string

Optional. Summarization prompt for LLM.

QuestionAnsweringQualityInput

Input for question answering quality metric.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringQualitySpec)
  },
  "instance": {
    object (QuestionAnsweringQualityInstance)
  }
}
Fields
metricSpec

object (QuestionAnsweringQualitySpec)

Required. Spec for question answering quality score metric.

instance

object (QuestionAnsweringQualityInstance)

Required. Question answering quality instance.

QuestionAnsweringQualitySpec

Spec for question answering quality score metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute question answering quality.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringQualityInstance

Spec for question answering quality instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Required. Text to answer the question.

instruction

string

Required. Question Answering prompt for LLM.

PairwiseQuestionAnsweringQualityInput

Input for pairwise question answering quality metric.

JSON representation
{
  "metricSpec": {
    object (PairwiseQuestionAnsweringQualitySpec)
  },
  "instance": {
    object (PairwiseQuestionAnsweringQualityInstance)
  }
}
Fields
metricSpec

object (PairwiseQuestionAnsweringQualitySpec)

Required. Spec for pairwise question answering quality score metric.

instance

object (PairwiseQuestionAnsweringQualityInstance)

Required. Pairwise question answering quality instance.

PairwiseQuestionAnsweringQualitySpec

Spec for pairwise question answering quality score metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute question answering quality.

version

integer

Optional. Which version to use for evaluation.

PairwiseQuestionAnsweringQualityInstance

Spec for pairwise question answering quality instance.

JSON representation
{
  "prediction": string,
  "baselinePrediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the candidate model.

baselinePrediction

string

Required. Output of the baseline model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Required. Text to answer the question.

instruction

string

Required. Question Answering prompt for LLM.

QuestionAnsweringRelevanceInput

Input for question answering relevance metric.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringRelevanceSpec)
  },
  "instance": {
    object (QuestionAnsweringRelevanceInstance)
  }
}
Fields
metricSpec

object (QuestionAnsweringRelevanceSpec)

Required. Spec for question answering relevance score metric.

instance

object (QuestionAnsweringRelevanceInstance)

Required. Question answering relevance instance.

QuestionAnsweringRelevanceSpec

Spec for question answering relevance metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute question answering relevance.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringRelevanceInstance

Spec for question answering relevance instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Optional. Text provided as context to answer the question.

instruction

string

Required. The question asked and other instruction in the inference prompt.

QuestionAnsweringHelpfulnessInput

Input for question answering helpfulness metric.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringHelpfulnessSpec)
  },
  "instance": {
    object (QuestionAnsweringHelpfulnessInstance)
  }
}
Fields
metricSpec

object (QuestionAnsweringHelpfulnessSpec)

Required. Spec for question answering helpfulness score metric.

instance

object (QuestionAnsweringHelpfulnessInstance)

Required. Question answering helpfulness instance.

QuestionAnsweringHelpfulnessSpec

Spec for question answering helpfulness metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute question answering helpfulness.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringHelpfulnessInstance

Spec for question answering helpfulness instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Optional. Text provided as context to answer the question.

instruction

string

Required. The question asked and other instruction in the inference prompt.

QuestionAnsweringCorrectnessInput

Input for question answering correctness metric.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringCorrectnessSpec)
  },
  "instance": {
    object (QuestionAnsweringCorrectnessInstance)
  }
}
Fields
metricSpec

object (QuestionAnsweringCorrectnessSpec)

Required. Spec for question answering correctness score metric.

instance

object (QuestionAnsweringCorrectnessInstance)

Required. Question answering correctness instance.

QuestionAnsweringCorrectnessSpec

Spec for question answering correctness metric.

JSON representation
{
  "useReference": boolean,
  "version": integer
}
Fields
useReference

boolean

Optional. Whether to use instance.reference to compute question answering correctness.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringCorrectnessInstance

Spec for question answering correctness instance.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Optional. Ground truth used to compare against the prediction.

context

string

Optional. Text provided as context to answer the question.

instruction

string

Required. The question asked and other instruction in the inference prompt.

ToolCallValidInput

Input for tool call valid metric.

JSON representation
{
  "metricSpec": {
    object (ToolCallValidSpec)
  },
  "instances": [
    {
      object (ToolCallValidInstance)
    }
  ]
}
Fields
metricSpec

object (ToolCallValidSpec)

Required. Spec for tool call valid metric.

instances[]

object (ToolCallValidInstance)

Required. Repeated tool call valid instances.

ToolCallValidSpec

This type has no fields.

Spec for tool call valid metric.

ToolCallValidInstance

Spec for tool call valid instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ToolNameMatchInput

Input for tool name match metric.

JSON representation
{
  "metricSpec": {
    object (ToolNameMatchSpec)
  },
  "instances": [
    {
      object (ToolNameMatchInstance)
    }
  ]
}
Fields
metricSpec

object (ToolNameMatchSpec)

Required. Spec for tool name match metric.

instances[]

object (ToolNameMatchInstance)

Required. Repeated tool name match instances.

ToolNameMatchSpec

This type has no fields.

Spec for tool name match metric.

ToolNameMatchInstance

Spec for tool name match instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ToolParameterKeyMatchInput

Input for tool parameter key match metric.

JSON representation
{
  "metricSpec": {
    object (ToolParameterKeyMatchSpec)
  },
  "instances": [
    {
      object (ToolParameterKeyMatchInstance)
    }
  ]
}
Fields
metricSpec

object (ToolParameterKeyMatchSpec)

Required. Spec for tool parameter key match metric.

instances[]

object (ToolParameterKeyMatchInstance)

Required. Repeated tool parameter key match instances.

ToolParameterKeyMatchSpec

This type has no fields.

Spec for tool parameter key match metric.

ToolParameterKeyMatchInstance

Spec for tool parameter key match instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ToolParameterKVMatchInput

Input for tool parameter key value match metric.

JSON representation
{
  "metricSpec": {
    object (ToolParameterKVMatchSpec)
  },
  "instances": [
    {
      object (ToolParameterKVMatchInstance)
    }
  ]
}
Fields
metricSpec

object (ToolParameterKVMatchSpec)

Required. Spec for tool parameter key value match metric.

instances[]

object (ToolParameterKVMatchInstance)

Required. Repeated tool parameter key value match instances.

ToolParameterKVMatchSpec

Spec for tool parameter key value match metric.

JSON representation
{
  "useStrictStringMatch": boolean
}
Fields
useStrictStringMatch

boolean

Optional. Whether to use STRCIT string match on parameter values.

ToolParameterKVMatchInstance

Spec for tool parameter key value match instance.

JSON representation
{
  "prediction": string,
  "reference": string
}
Fields
prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ExactMatchResults

Results for exact match metric.

JSON representation
{
  "exactMatchMetricValues": [
    {
      object (ExactMatchMetricValue)
    }
  ]
}
Fields
exactMatchMetricValues[]

object (ExactMatchMetricValue)

Output only. Exact match metric values.

ExactMatchMetricValue

Exact match metric value for an instance.

JSON representation
{
  "score": number
}
Fields
score

number

Output only. Exact match score.

BleuResults

Results for bleu metric.

JSON representation
{
  "bleuMetricValues": [
    {
      object (BleuMetricValue)
    }
  ]
}
Fields
bleuMetricValues[]

object (BleuMetricValue)

Output only. Bleu metric values.

BleuMetricValue

Bleu metric value for an instance.

JSON representation
{
  "score": number
}
Fields
score

number

Output only. Bleu score.

RougeResults

Results for rouge metric.

JSON representation
{
  "rougeMetricValues": [
    {
      object (RougeMetricValue)
    }
  ]
}
Fields
rougeMetricValues[]

object (RougeMetricValue)

Output only. Rouge metric values.

RougeMetricValue

Rouge metric value for an instance.

JSON representation
{
  "score": number
}
Fields
score

number

Output only. Rouge score.

FluencyResult

Spec for fluency result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for fluency score.

score

number

Output only. Fluency score.

confidence

number

Output only. confidence for fluency score.

CoherenceResult

Spec for coherence result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for coherence score.

score

number

Output only. Coherence score.

confidence

number

Output only. confidence for coherence score.

SafetyResult

Spec for safety result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for safety score.

score

number

Output only. Safety score.

confidence

number

Output only. confidence for safety score.

GroundednessResult

Spec for groundedness result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for groundedness score.

score

number

Output only. Groundedness score.

confidence

number

Output only. confidence for groundedness score.

FulfillmentResult

Spec for fulfillment result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for fulfillment score.

score

number

Output only. Fulfillment score.

confidence

number

Output only. confidence for fulfillment score.

SummarizationQualityResult

Spec for summarization quality result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for summarization quality score.

score

number

Output only. Summarization Quality score.

confidence

number

Output only. confidence for summarization quality score.

PairwiseSummarizationQualityResult

Spec for pairwise summarization quality result.

JSON representation
{
  "pairwiseChoice": enum (PairwiseChoice),
  "explanation": string,
  "confidence": number
}
Fields
pairwiseChoice

enum (PairwiseChoice)

Output only. Pairwise summarization prediction choice.

explanation

string

Output only. Explanation for summarization quality score.

confidence

number

Output only. confidence for summarization quality score.

PairwiseChoice

Pairwise prediction autorater preference.

Enums
PAIRWISE_CHOICE_UNSPECIFIED Unspecified prediction choice.
BASELINE baseline prediction wins
CANDIDATE Candidate prediction wins
TIE Winner cannot be determined

SummarizationHelpfulnessResult

Spec for summarization helpfulness result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for summarization helpfulness score.

score

number

Output only. Summarization Helpfulness score.

confidence

number

Output only. confidence for summarization helpfulness score.

SummarizationVerbosityResult

Spec for summarization verbosity result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for summarization verbosity score.

score

number

Output only. Summarization Verbosity score.

confidence

number

Output only. confidence for summarization verbosity score.

QuestionAnsweringQualityResult

Spec for question answering quality result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for question answering quality score.

score

number

Output only. Question Answering Quality score.

confidence

number

Output only. confidence for question answering quality score.

PairwiseQuestionAnsweringQualityResult

Spec for pairwise question answering quality result.

JSON representation
{
  "pairwiseChoice": enum (PairwiseChoice),
  "explanation": string,
  "confidence": number
}
Fields
pairwiseChoice

enum (PairwiseChoice)

Output only. Pairwise question answering prediction choice.

explanation

string

Output only. Explanation for question answering quality score.

confidence

number

Output only. confidence for question answering quality score.

QuestionAnsweringRelevanceResult

Spec for question answering relevance result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for question answering relevance score.

score

number

Output only. Question Answering Relevance score.

confidence

number

Output only. confidence for question answering relevance score.

QuestionAnsweringHelpfulnessResult

Spec for question answering helpfulness result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for question answering helpfulness score.

score

number

Output only. Question Answering Helpfulness score.

confidence

number

Output only. confidence for question answering helpfulness score.

QuestionAnsweringCorrectnessResult

Spec for question answering correctness result.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}
Fields
explanation

string

Output only. Explanation for question answering correctness score.

score

number

Output only. Question Answering Correctness score.

confidence

number

Output only. confidence for question answering correctness score.

ToolCallValidResults

Results for tool call valid metric.

JSON representation
{
  "toolCallValidMetricValues": [
    {
      object (ToolCallValidMetricValue)
    }
  ]
}
Fields
toolCallValidMetricValues[]

object (ToolCallValidMetricValue)

Output only. Tool call valid metric values.

ToolCallValidMetricValue

Tool call valid metric value for an instance.

JSON representation
{
  "score": number
}
Fields
score

number

Output only. Tool call valid score.

ToolNameMatchResults

Results for tool name match metric.

JSON representation
{
  "toolNameMatchMetricValues": [
    {
      object (ToolNameMatchMetricValue)
    }
  ]
}
Fields
toolNameMatchMetricValues[]

object (ToolNameMatchMetricValue)

Output only. Tool name match metric values.

ToolNameMatchMetricValue

Tool name match metric value for an instance.

JSON representation
{
  "score": number
}
Fields