- JSON representation
- VideoAnnotationResults
- LabelAnnotation
- Entity
- LabelSegment
- VideoSegment
- LabelFrame
- ExplicitContentAnnotation
- ExplicitContentFrame
- SpeechTranscription
- SpeechRecognitionAlternative
- WordInfo
Video annotation response. Included in the response field of the Operation returned by the operations.get call of the google::longrunning::Operations service.
| JSON representation | |
|---|---|
| {
  "annotationResults": [
    {
      object( | |
| Fields | |
|---|---|
| annotationResults[] | 
 
                  Annotation results for all videos specified in  | 
VideoAnnotationResults
Annotation results for a single video.
| JSON representation | |
|---|---|
| { "inputUri": string, "segmentLabelAnnotations": [ { object( | |
| Fields | |
|---|---|
| inputUri | 
 Video file location in Google Cloud Storage. | 
| segmentLabelAnnotations[] | 
 Label annotations on video level or user specified segment level. There is exactly one element for each unique label. | 
| shotLabelAnnotations[] | 
 Label annotations on shot level. There is exactly one element for each unique label. | 
| frameLabelAnnotations[] | 
 Label annotations on frame level. There is exactly one element for each unique label. | 
| shotAnnotations[] | 
 Shot annotations. Each shot is represented as a video segment. | 
| explicitAnnotation | 
 Explicit content annotation. | 
| speechTranscriptions[] | 
 Speech transcription. | 
| error | 
 
                    If set, indicates an error. Note that for a single  | 
LabelAnnotation
Label annotation.
| JSON representation | |
|---|---|
| { "entity": { object( | |
| Fields | |
|---|---|
| entity | 
 Detected entity. | 
| categoryEntities[] | 
 
                    Common categories for the detected entity. E.g. when the label is  | 
| segments[] | 
 All video segments where a label was detected. | 
| frames[] | 
 All video frames where a label was detected. | 
Entity
Detected entity from video analysis.
| JSON representation | |
|---|---|
| { "entityId": string, "description": string, "languageCode": string } | |
| Fields | |
|---|---|
| entityId | 
 Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API. | 
| description | 
 
                    Textual description, e.g.  | 
| languageCode | 
 
                    Language code for  | 
LabelSegment
Video segment level annotation results for label detection.
| JSON representation | |
|---|---|
| {
  "segment": {
    object( | |
| Fields | |
|---|---|
| segment | 
 Video segment where a label was detected. | 
| confidence | 
 Confidence that the label is accurate. Range: [0, 1]. | 
VideoSegment
Video segment.
| JSON representation | |
|---|---|
| { "startTimeOffset": string, "endTimeOffset": string } | |
| Fields | |
|---|---|
| startTimeOffset | 
 
 Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by ' | 
| endTimeOffset | 
 
 Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by ' | 
LabelFrame
Video frame level annotation results for label detection.
| JSON representation | |
|---|---|
| { "timeOffset": string, "confidence": number } | |
| Fields | |
|---|---|
| timeOffset | 
 
 Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by ' | 
| confidence | 
 Confidence that the label is accurate. Range: [0, 1]. | 
ExplicitContentAnnotation
Explicit content annotation (based on per-frame visual signals only). If no explicit content has been detected in a frame, no annotations are present for that frame.
| JSON representation | |
|---|---|
| {
  "frames": [
    {
      object( | |
| Fields | |
|---|---|
| frames[] | 
 All video frames where explicit content was detected. | 
ExplicitContentFrame
Video frame level annotation results for explicit content.
| JSON representation | |
|---|---|
| {
  "timeOffset": string,
  "pornographyLikelihood": enum( | |
| Fields | |
|---|---|
| timeOffset | 
 
 Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by ' | 
| pornographyLikelihood | 
 Likelihood of the pornography content.. | 
SpeechTranscription
A speech recognition result corresponding to a portion of the audio.
| JSON representation | |
|---|---|
| {
  "alternatives": [
    {
      object( | |
| Fields | |
|---|---|
| alternatives[] | 
 
                    Output only. May contain one or more recognition hypotheses (up to the maximum specified in  | 
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
| JSON representation | |
|---|---|
| {
  "transcript": string,
  "confidence": number,
  "words": [
    {
      object( | |
| Fields | |
|---|---|
| transcript | 
 Output only. Transcript text representing the words that the user spoke. | 
| confidence | 
 
                    Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for  | 
| words[] | 
 Output only. A list of word-specific information for each recognized word. | 
WordInfo
Word-specific information for recognized words. Word information is only included in the response when certain request parameters are set, such as enable_word_time_offsets.
| JSON representation | |
|---|---|
| { "startTime": string, "endTime": string, "word": string } | |
| Fields | |
|---|---|
| startTime | 
 
 Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if  A duration in seconds with up to nine fractional digits, terminated by ' | 
| endTime | 
 
 Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if  A duration in seconds with up to nine fractional digits, terminated by ' | 
| word | 
 Output only. The word corresponding to this set of information. |