Index
BoundingBoxMetrics
(message)BoundingBoxMetrics.ConfidenceMetrics
(message)ClassificationEvaluationMetrics
(message)ClassificationEvaluationMetrics.ConfidenceMetrics
(message)ConfusionMatrix
(message)ConfusionMatrix.AnnotationSpecRef
(message)ForecastingEvaluationMetrics
(message)ForecastingEvaluationMetrics.QuantileMetricsEntry
(message)GeneralTextGenerationEvaluationMetrics
(message)ImageObjectDetectionEvaluationMetrics
(message)ImageSegmentationEvaluationMetrics
(message)ImageSegmentationEvaluationMetrics.ConfidenceMetricsEntry
(message)PairwiseTextGenerationEvaluationMetrics
(message)QuestionAnsweringEvaluationMetrics
(message)RegressionEvaluationMetrics
(message)SummarizationEvaluationMetrics
(message)TextExtractionEvaluationMetrics
(message)TextExtractionEvaluationMetrics.ConfidenceMetrics
(message)TextSentimentEvaluationMetrics
(message)TrackMetrics
(message)TrackMetrics.ConfidenceMetrics
(message)VideoActionMetrics
(message)VideoActionMetrics.ConfidenceMetrics
(message)VideoActionRecognitionMetrics
(message)VideoObjectTrackingMetrics
(message)
BoundingBoxMetrics
Bounding box matching model metrics for a single intersection-over-union threshold and multiple label match confidence thresholds.
Fields | |
---|---|
confidence_ |
Metrics for each label-match confidence_threshold from 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99. Precision-recall curve is derived from them. |
iou_ |
The intersection-over-union threshold value used to compute this metrics entry. |
mean_ |
The mean average precision, most often close to |
ConfidenceMetrics
Metrics for a single confidence threshold.
Fields | |
---|---|
confidence_ |
The confidence threshold value used to compute the metrics. |
recall |
Recall under the given confidence threshold. |
precision |
Precision under the given confidence threshold. |
f1_ |
The harmonic mean of recall and precision. |
ClassificationEvaluationMetrics
Metrics for classification evaluation results.
Fields | |
---|---|
confidence_ |
Metrics for each ROC and precision-recall curves, and other aggregated metrics are derived from them. The confidence metrics entries may also be supplied for additional values of |
confusion_ |
Confusion matrix of the evaluation. |
au_ |
The Area Under Precision-Recall Curve metric. Micro-averaged for the overall evaluation. |
au_ |
The Area Under Receiver Operating Characteristic curve metric. Micro-averaged for the overall evaluation. |
log_ |
The Log Loss metric. |
ConfidenceMetrics
Fields | |
---|---|
confusion_ |
Confusion matrix of the evaluation for this confidence_threshold. |
confidence_ |
Metrics are computed with an assumption that the Model never returns predictions with score lower than this value. |
max_ |
Metrics are computed with an assumption that the Model always returns at most this many predictions (ordered by their score, descendingly), but they all still need to meet the |
recall |
Recall (True Positive Rate) for the given confidence threshold. |
precision |
Precision for the given confidence threshold. |
false_ |
False Positive Rate for the given confidence threshold. |
f1_ |
The harmonic mean of recall and precision. For summary metrics, it computes the micro-averaged F1 score. |
f1_ |
Micro-averaged F1 Score. |
f1_ |
Macro-averaged F1 Score. |
recall_ |
The Recall (True Positive Rate) when only considering the label that has the highest prediction score and not below the confidence threshold for each DataItem. |
precision_ |
The precision when only considering the label that has the highest prediction score and not below the confidence threshold for each DataItem. |
false_ |
The False Positive Rate when only considering the label that has the highest prediction score and not below the confidence threshold for each DataItem. |
f1_ |
The harmonic mean of recallAt1 and precisionAt1. |
true_ |
The number of Model created labels that match a ground truth label. |
false_ |
The number of Model created labels that do not match a ground truth label. |
false_ |
The number of ground truth labels that are not matched by a Model created label. |
true_ |
The number of labels that were not created by the Model, but if they would, they would not match a ground truth label. |
ConfusionMatrix
Fields | |
---|---|
annotation_ |
AnnotationSpecs used in the confusion matrix. For AutoML Text Extraction, a special negative AnnotationSpec with empty |
rows[] |
Rows in the confusion matrix. The number of rows is equal to the size of For Text Extraction, when |
AnnotationSpecRef
Fields | |
---|---|
id |
ID of the AnnotationSpec. |
display_ |
Display name of the AnnotationSpec. |
ForecastingEvaluationMetrics
Metrics for forecasting evaluation results.
Fields | |
---|---|
quantile_ |
The quantile metrics entries for each quantile. |
root_ |
Root Mean Squared Error (RMSE). |
mean_ |
Mean Absolute Error (MAE). |
mean_ |
Mean absolute percentage error. Infinity when there are zeros in the ground truth. |
r_ |
Coefficient of determination as Pearson correlation coefficient. Undefined when ground truth or predictions are constant or near constant. |
root_ |
Root mean squared log error. Undefined when there are negative ground truth values or predictions. |
weighted_ |
Weighted Absolute Percentage Error. Does not use weights, this is just what the metric is called. Undefined if actual values sum to zero. Will be very large if actual values sum to a very small number. |
root_ |
Root Mean Square Percentage Error. Square root of MSPE. Undefined/imaginary when MSPE is negative. |
QuantileMetricsEntry
Entry for the Quantiles loss type optimization objective.
Fields | |
---|---|
quantile |
The quantile for this entry. |
scaled_ |
The scaled pinball loss of this quantile. |
observed_ |
This is a custom metric that calculates the percentage of true values that were less than the predicted value for that quantile. Only populated when [optimization_objective][google.cloud.aiplatform.publicfiles.trainingjob.definition.AutoMlForecastingInputs.optimization_objective] is minimize-quantile-loss and each entry corresponds to an entry in [quantiles][google.cloud.aiplatform.publicfiles.trainingjob.definition.AutoMlForecastingInputs.quantiles] The percent value can be used to compare with the quantile value, which is the target value. |
GeneralTextGenerationEvaluationMetrics
Fields | |
---|---|
bleu |
BLEU (bilingual evaluation understudy) scores based on sacrebleu implementation. |
rouge_ |
ROUGE-L (Longest Common Subsequence) scoring at summary level. |
ImageObjectDetectionEvaluationMetrics
Metrics for image object detection evaluation results.
Fields | |
---|---|
bounding_ |
The bounding boxes match metrics for each intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair. |
evaluated_ |
The total number of bounding boxes (i.e. summed over all images) the ground truth used to create this evaluation had. |
bounding_ |
The single metric for bounding boxes evaluation: the |
ImageSegmentationEvaluationMetrics
Metrics for image segmentation evaluation results.
Fields | |
---|---|
confidence_ |
Metrics for each confidenceThreshold in 0.00,0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 Precision-recall curve can be derived from it. |
ConfidenceMetricsEntry
Fields | |
---|---|
confusion_ |
Confusion matrix for the given confidence threshold. |
confidence_ |
Metrics are computed with an assumption that the model never returns predictions with score lower than this value. |
recall |
Recall (True Positive Rate) for the given confidence threshold. |
precision |
Precision for the given confidence threshold. |
dice_ |
DSC or the F1 score, The harmonic mean of recall and precision. |
iou_ |
The intersection-over-union score. The measure of overlap of the annotation's category mask with ground truth category mask on the DataItem. |
PairwiseTextGenerationEvaluationMetrics
Metrics for general pairwise text generation evaluation results.
Fields | |
---|---|
model_ |
Percentage of time the autorater decided the model had the better response. |
baseline_ |
Percentage of time the autorater decided the baseline model had the better response. |
human_ |
Percentage of time humans decided the model had the better response. |
human_ |
Percentage of time humans decided the baseline model had the better response. |
true_ |
Number of examples where both the autorater and humans decided that the model had the better response. |
false_ |
Number of examples where the autorater chose the model, but humans preferred the baseline model. |
false_ |
Number of examples where the autorater chose the baseline model, but humans preferred the model. |
true_ |
Number of examples where both the autorater and humans decided that the model had the worse response. |
accuracy |
Fraction of cases where the autorater agreed with the human raters. |
precision |
Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the autorater thought the model had a better response. True positive divided by all positive. |
recall |
Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the humans thought the model had a better response. |
f1_ |
Harmonic mean of precision and recall. |
cohens_ |
A measurement of agreement between the autorater and human raters that takes the likelihood of random agreement into account. |
QuestionAnsweringEvaluationMetrics
Fields | |
---|---|
exact_ |
The rate at which the input predicted strings exactly match their references. |
RegressionEvaluationMetrics
Metrics for regression evaluation results.
Fields | |
---|---|
root_ |
Root Mean Squared Error (RMSE). |
mean_ |
Mean Absolute Error (MAE). |
mean_ |
Mean absolute percentage error. Infinity when there are zeros in the ground truth. |
r_ |
Coefficient of determination as Pearson correlation coefficient. Undefined when ground truth or predictions are constant or near constant. |
root_ |
Root mean squared log error. Undefined when there are negative ground truth values or predictions. |
SummarizationEvaluationMetrics
Fields | |
---|---|
rouge_ |
ROUGE-L (Longest Common Subsequence) scoring at summary level. |
TextExtractionEvaluationMetrics
Metrics for text extraction evaluation results.
Fields | |
---|---|
confidence_ |
Metrics that have confidence thresholds. Precision-recall curve can be derived from them. |
confusion_ |
Confusion matrix of the evaluation. Only set for Models where number of AnnotationSpecs is no more than 10. Only set for ModelEvaluations, not for ModelEvaluationSlices. |
ConfidenceMetrics
Fields | |
---|---|
confidence_ |
Metrics are computed with an assumption that the Model never returns predictions with score lower than this value. |
recall |
Recall (True Positive Rate) for the given confidence threshold. |
precision |
Precision for the given confidence threshold. |
f1_ |
The harmonic mean of recall and precision. |
TextSentimentEvaluationMetrics
Model evaluation metrics for text sentiment problems.
Fields | |
---|---|
confusion_ |
Confusion matrix of the evaluation. Only set for ModelEvaluations, not for ModelEvaluationSlices. |
precision |
Precision. |
recall |
Recall. |
f1_ |
The harmonic mean of recall and precision. |
mean_ |
Mean absolute error. Only set for ModelEvaluations, not for ModelEvaluationSlices. |
mean_ |
Mean squared error. Only set for ModelEvaluations, not for ModelEvaluationSlices. |
linear_ |
Linear weighted kappa. Only set for ModelEvaluations, not for ModelEvaluationSlices. |
quadratic_ |
Quadratic weighted kappa. Only set for ModelEvaluations, not for ModelEvaluationSlices. |
TrackMetrics
UNIMPLEMENTED. Track matching model metrics for a single track match threshold and multiple label match confidence thresholds.
Fields | |
---|---|
confidence_ |
Metrics for each label-match |
iou_ |
The intersection-over-union threshold value between bounding boxes across frames used to compute this metric entry. |
mean_ |
The mean average precision over all confidence thresholds. |
mean_ |
The mean bounding box iou over all confidence thresholds. |
mean_ |
The mean mismatch rate over all confidence thresholds. |
ConfidenceMetrics
Metrics for a single confidence threshold.
Fields | |
---|---|
confidence_ |
The confidence threshold value used to compute the metrics. |
tracking_ |
Tracking precision. |
tracking_ |
Tracking recall. |
bounding_ |
Bounding box intersection-over-union precision. Measures how well the bounding boxes overlap between each other (e.g. complete overlap or just barely above iou_threshold). |
mismatch_ |
Mismatch rate, which measures the tracking consistency, i.e. correctness of instance ID continuity. |
VideoActionMetrics
The Evaluation metrics given a specific precision_window_length
.
Fields | |
---|---|
confidence_ |
Metrics for each label-match confidence_threshold from 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99. |
precision_ |
This |
mean_ |
The mean average precision. |
ConfidenceMetrics
Metrics for a single confidence threshold.
Fields | |
---|---|
confidence_ |
Output only. The confidence threshold value used to compute the metrics. |
recall |
Output only. Recall for the given confidence threshold. |
precision |
Output only. Precision for the given confidence threshold. |
f1_ |
Output only. The harmonic mean of recall and precision. |
VideoActionRecognitionMetrics
Model evaluation metrics for video action recognition.
Fields | |
---|---|
video_ |
The metric entries for precision window lengths: 1s,2s,3s. |
evaluated_ |
The number of ground truth actions used to create this evaluation. |
VideoObjectTrackingMetrics
Model evaluation metrics for video object tracking problems. Evaluates prediction quality of both labeled bounding boxes and labeled tracks (i.e. series of bounding boxes sharing same label and instance ID).
Fields | |
---|---|
bounding_ |
The bounding boxes match metrics for each intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair. |
track_ |
UNIMPLEMENTED. The tracks match metrics for each intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair. |
evaluated_ |
UNIMPLEMENTED. The number of video frames used to create this evaluation. |
evaluated_ |
UNIMPLEMENTED. The total number of bounding boxes (i.e. summed over all frames) the ground truth used to create this evaluation had. |
evaluated_ |
UNIMPLEMENTED. The total number of tracks (i.e. as seen across all frames) the ground truth used to create this evaluation had. |
bounding_ |
The single metric for bounding boxes evaluation: the |
track_ |
UNIMPLEMENTED. The single metric for tracks accuracy evaluation: the |
track_ |
UNIMPLEMENTED. The single metric for tracks bounding box iou evaluation: the |
track_ |
UNIMPLEMENTED. The single metric for tracking consistency evaluation: the |