Stay organized with collections
Save and categorize content based on your preferences.
The Video Intelligence API can identify entities shown in video footage
using the LABEL_DETECTION
feature and annotate these entities with labels (tags). This feature identifies
objects, locations, activities, animal species, products, and more.
Label detection differs from Object tracking.
Unlike object tracking, label detection provides labels for the entire frame
(without bounding boxes).
For example, for a video of a train at a crossing, the Video Intelligence API
returns labels such as "train", "transportation", "railroad crossing",
and so on. Each label includes a time segment with the time offset (timestamp)
for the entity's appearance from the beginning of the video.
Each annotation also contains additional information including an entity
id that you can use to find more information about the
entity in the Google Knowledge Graph Search API.
Each entity returned can also include associated
category entities in the categoryEntities field. For example the
"Terrier" entity label has a category of "Dog". Category entities have a
hierarchy. For example, the "Dog" category is a child of the "Mammal"
category in the hierarchy. For a list of the common category entities that the
Video Intelligence uses, see
entry-level-categories.json.
The analysis can be compartmentalized as follows:
Segment level: User-selected segments of a video can be specified
for analysis by stipulating beginning and ending timestamps for the purposes
of annotation (see VideoSegment).
Entities are then identified and labeled within each segment. If no segments
are specified, the whole video is treated as one segment.
Shot level: Shots (also known as a scene) are automatically detected within
every segment (or video). Entities are then identified and labeled within
each scene. For details, see Shot change detection
Frame level: Entities are identified and labeled within each frame
(with one frame per second sampling).
To detect labels in a video, call the
annotate
method and specify
LABEL_DETECTION
in the features field.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-03 UTC."],[],[],null,["# Analyze videos for labels\n\nThe Video Intelligence API can identify entities shown in video footage\nusing the [LABEL_DETECTION](/video-intelligence/docs/reference/rest/v1/videos/annotate#feature)\nfeature and annotate these entities with labels (tags). This feature identifies\nobjects, locations, activities, animal species, products, and more.\n\nLabel detection differs from [Object tracking](/video-intelligence/docs/object-tracking).\nUnlike object tracking, label detection provides labels for the entire frame\n(without bounding boxes).\n\nFor example, for a video of a train at a crossing, the Video Intelligence API\nreturns labels such as \"train\", \"transportation\", \"railroad crossing\",\nand so on. Each label includes a time segment with the time offset (timestamp)\nfor the entity's appearance from the beginning of the video.\nEach annotation also contains additional information including an entity\nid that you can use to find more information about the\nentity in the [Google Knowledge Graph Search API](https://developers.google.com/knowledge-graph/).\n\nEach entity returned can also include associated\ncategory entities in the `categoryEntities` field. For example the\n\"Terrier\" entity label has a category of \"Dog\". Category entities have a\nhierarchy. For example, the \"Dog\" category is a child of the \"Mammal\"\ncategory in the hierarchy. For a list of the common category entities that the\nVideo Intelligence uses, see\n[entry-level-categories.json](/static/video-intelligence/docs/entry-level-categories.json).\n\nThe analysis can be compartmentalized as follows:\n\n- Segment level: \n User-selected segments of a video can be specified for analysis by stipulating beginning and ending timestamps for the purposes of annotation (see [VideoSegment](/video-intelligence/docs/reference/rest/v1/videos/annotate#videosegment)). Entities are then identified and labeled within each segment. If no segments are specified, the whole video is treated as one segment.\n\n \u003cbr /\u003e\n\n \u003cbr /\u003e\n\n- Shot level: \n Shots (also known as a *scene* ) are automatically detected within every segment (or video). Entities are then identified and labeled within each scene. For details, see [Shot change detection](#shot-change)\n- Frame level: \n Entities are identified and labeled within each frame (with one frame per second sampling).\n\n\u003cbr /\u003e\n\nTo detect labels in a video, call the\n[`annotate`](/video-intelligence/docs/reference/rest/v1/videos/annotate)\nmethod and specify\n[`LABEL_DETECTION`](/video-intelligence/docs/reference/rest/v1/videos#Feature)\nin the `features` field.\n\nSee\n[Analyzing Videos for Labels](/video-intelligence/docs/analyze-labels) and\n[Label Detection Tutorial](/video-intelligence/docs/label-tutorial).\n\nVideo Intelligence API Visualizer\n=================================\n\nCheck out the [Video Intelligence API visualizer](https://zackakil.github.io/video-intelligence-api-visualiser/#Label%20Detection) to see this feature in action."]]