Interpret prediction results from video action recognition models

After requesting a prediction, Vertex AI returns results based on your model's objective. Predictions from an action recognition model return moments of actions, according to your own defined labels. The model assigns a confidence score to each prediction, which communicates how confident your model accurately identified an action. The higher the number - the higher the model's confidence is of the correctness of the prediction.

Example batch prediction output

The following sample is the predicted result for a model that identifies the "swing" and "jump" actions in a video. Each result includes a label ("swing" or "jump") for the identified action, a time segment with the same start and end time that specifies the moment of the action, and a confidence score.

  "instance": {
   "content": "gs://bucket/video.mp4",
    "mimeType": "video/mp4",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "5s"
  "prediction": [{
    "id": "1",
    "displayName": "swing",
    "timeSegmentStart": "1.2s",
    "timeSegmentEnd": "1.2s",
    "confidence": 0.7
  }, {
    "id": "2",
    "displayName": "jump",
    "timeSegmentStart": "3.4s",
    "timeSegmentEnd": "3.4s",
    "confidence": 0.5