Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Halaman ini menjelaskan cara menyiapkan data pelatihan video untuk digunakan dalam set data Vertex AI guna melatih model klasifikasi video.
Bagian berikut memberikan informasi tentang persyaratan data,
file skema, dan format file impor data (JSONL & CSV) yang
ditentukan oleh skema.
Atau, Anda dapat mengimpor video yang belum dianotasi dan menganotasikannya di lain waktu menggunakan konsol Google Cloud (lihat Pelabelan menggunakan konsol Google Cloud ).
Persyaratan data
Persyaratan berikut berlaku untuk set data yang digunakan untuk melatih model AutoML atau model yang dilatih khusus.
Vertex AI mendukung format video berikut untuk melatih model Anda atau meminta prediksi (menganotasi video).
.MOV
.MPEG4
.MP4
.AVI
Untuk melihat konten video di konsol web atau untuk menganotasi video, video
harus dalam format yang didukung secara native oleh browser Anda.
Karena tidak semua browser dapat menangani konten .MOV atau .AVI secara native, rekomendasinya adalah menggunakan format video .MPEG4 atau .MP4.
Ukuran file maksimum adalah 50 GB (durasi hingga 3 jam). Setiap file
video dengan stempel waktu rusak atau kosong dalam container tidak didukung.
Jumlah maksimum label di setiap set data dibatasi hingga 1.000.
Anda dapat menetapkan label "ML_USE" untuk video yang berada dalam file impor. Pada waktu pelatihan, Anda dapat memilih untuk menggunakan label tersebut guna membagi video dan anotasi yang sesuai ke dalam set "pelatihan" atau "pengujian". Untuk klasifikasi video, perhatikan hal-hal berikut:
Setidaknya diperlukan dua class berbeda untuk pelatihan model. Misalnya, "berita" dan "MTV", atau "game" dan "lainnya".
Pertimbangkan untuk menyertakan class "Tidak__satu_pun_di_antaranya" dan segmen video yang
tidak cocok dengan semua class yang Anda tentukan.
Praktik terbaik untuk data video yang digunakan untuk melatih model AutoML
Praktik berikut berlaku untuk set data yang digunakan untuk melatih model AutoML.
Data pelatihan harus semirip mungkin dengan data yang menjadi dasar prediksi. Misalnya, jika dalam kasus penggunaan Anda terdapat video buram dengan resolusi rendah (seperti video dari kamera keamanan), data pelatihan Anda harus terdiri dari video buram dengan resolusi rendah. Secara umum, Anda juga harus mempertimbangkan untuk menyediakan beberapa sudut, resolusi, dan latar belakang untuk video pelatihan Anda.
Model Vertex AI umumnya tidak dapat memprediksi label yang tidak dapat ditetapkan oleh manusia. Jika manusia tidak dapat dilatih untuk menetapkan label dengan melihat video selama 1-2 detik, model tersebut kemungkinan tidak dapat dilatih untuk melakukan hal yang sama.
Model akan berfungsi optimal jika terdapat video maksimal 100 kali lebih banyak untuk label yang paling umum daripada label yang paling tidak umum. Sebaiknya hapus label
berfrekuensi rendah. Untuk klasifikasi video, jumlah video pelatihan yang direkomendasikan per label adalah sekitar 1.000. Jumlah minimum per label adalah 10, atau 50 untuk model lanjutan. Secara umum, diperlukan lebih banyak contoh per label untuk melatih model dengan beberapa label per video, dan skor yang dihasilkan akan lebih sulit ditafsirkan.
File skema
Gunakan file skema yang dapat diakses secara publik berikut saat membuat file jsonl untuk mengimpor anotasi. File skema ini menentukan format file input data. Struktur file mengikuti pengujian OpenAPI Schema.
title: VideoClassification
description: >
Import and export format for importing/exporting videos together with
classification annotations with time segment. Can be used in
Dataset.import_schema_uri field.
type: object
required:
- videoGcsUri
properties:
videoGcsUri:
type: string
description: >
A Cloud Storage URI pointing to a video. Up to 50 GB in size and
up to 3 hours in duration. Supported file mime types: `video/mp4`,
`video/avi`, `video/quicktime`.
timeSegmentAnnotations:
type: array
description: >
Multiple classification annotations. Each on a time segment of the video.
items:
type: object
description: Annotation with a time segment on media (e.g., video).
properties:
displayName:
type: string
description: >
It will be imported as/exported from AnnotationSpec's display name.
startTime:
type: string
description: >
The start of the time segment. Expressed as a number of seconds as
measured from the start of the video, with "s" appended at the end.
Fractions are allowed, up to a microsecond precision.
default: 0s
endTime:
type: string
description: >
The end of the time segment. Expressed as a number of seconds as
measured from the start of the video, with "s" appended at the end.
Fractions are allowed, up to a microsecond precision, and "Infinity"
is allowed, which corresponds to the end of the video.
default: Infinity
annotationResourceLabels:
description: Resource labels on the Annotation.
type: object
additionalProperties:
type: string
dataItemResourceLabels:
description: Resource labels on the DataItem.
type: object
additionalProperties:
type: string
File input
Format data pelatihan Anda untuk klasifikasi video adalah sebagai berikut.
ML_USE (Opsional). Untuk tujuan pemisahan data saat melatih model. Gunakan TRAINING atau TEST.
VIDEO_URI. Kolom ini berisi Cloud Storage URI untuk video. Cloud Storage URI peka huruf besar/kecil.
LABEL. Label harus diawali dengan huruf dan hanya berisi huruf, angka, serta garis bawah. Anda dapat menentukan beberapa label untuk
video dengan menambahkan beberapa baris dalam file CSV yang masing-masing mengidentifikasi
segmen video yang sama, dengan label yang berbeda untuk setiap baris.
START,END. Kedua kolom ini, START dan END, masing-masing mengidentifikasi waktu mulai dan berakhir segmen video yang akan dianalisis, dalam detik. Waktu mulai harus kurang dari waktu berakhir. Kedua
nilai harus positif dan berada dalam rentang waktu video. Misalnya, 0.09845,1.36005. Untuk menggunakan seluruh konten video, tentukan waktu mulai 0 dan waktu berakhir video berdurasi penuh, atau "inf". Misalnya, 0,inf.
Contoh CSV - Klasifikasi menggunakan label tunggal
Anda juga dapat memberikan video dalam file data tanpa menentukan label apa pun. Kemudian, Anda harus menggunakan konsol Google Cloud untuk menerapkan label ke data sebelum melatih model. Untuk melakukannya, Anda hanya perlu menyediakan Cloud Storage URI untuk video diikuti dengan tiga koma, seperti yang ditunjukkan pada contoh berikut.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-19 UTC."],[],[],null,["# Prepare video training data for classification\n\nThis page describes how to prepare video training data for use in a Vertex AI dataset to train a video classification model.\n\n\u003cbr /\u003e\n\nThe following sections provide information about data requirements,\nschema files, and the format of the data import files (JSONL \\& CSV) that are\ndefined by the schema.\n\nAlternatively, you can import videos that have not been annotated and\nannotate them later using the Google Cloud console\n(see [Labeling using the Google Cloud console](/vertex-ai/docs/datasets/label-using-console)).\n\nData requirements\n-----------------\n\nThe following requirements apply to datasets used to train AutoML\nor custom-trained models.\n\n- Vertex AI supports the following video formats for training\n your model or requesting a prediction (annotating a video).\n\n - .MOV\n - .MPEG4\n - .MP4\n - .AVI\n- To view the video content in the web console or to annotate a video, the video\n must be in a format that your browser natively supports.\n Since not all browsers handle .MOV or .AVI content natively, the\n recommendation is to use either .MPEG4 or .MP4 video format.\n\n- Maximum file size is 50 GB (up to 3 hours in duration). Individual video\n files with malformed or empty timestamps in the container aren't supported.\n\n- The maximum number of labels in each dataset is limited to 1,000.\n\n- You may assign \"ML_USE\" labels to the videos in the import files. At\n training time, you may choose to use those labels to split the videos and\n their corresponding annotations into \"training\" or \"test\" sets. For video\n classification, note the following:\n\n - At least two different classes are required for model training. For example, \"news\" and \"MTV\", or \"game\" and \"others\".\n - Consider including a \"None_of_the_above\" class and video segments that do not match any of your defined classes.\n\nBest practices for video data used to train AutoML models\n---------------------------------------------------------\n\nThe following practices apply to datasets used to train AutoML\nmodels.\n\n- The training data should be as close as possible to the data on which\n predictions are to be made. For example, if your use case involves blurry and\n low-resolution videos (such as from a security camera), your training data\n should be composed of blurry, low-resolution videos. In general, you should\n also consider providing multiple angles, resolutions, and backgrounds for\n your training videos.\n\n- Vertex AI models can't generally predict labels that humans\n can't assign. If a human can't be trained to assign labels by looking at the\n video for 1-2 seconds, the model likely can't be trained to do it either.\n\n- The model works best when there are at most 100 times more videos for the\n most common label than for the least common label. We recommend removing low\n frequency labels. For video classification, the recommended number of training\n videos per label is about 1,000. The minimum per label is 10, or 50 for advanced\n models. In general, it takes more examples per label to train models with\n multiple labels per video, and resulting scores are harder to interpret.\n\nSchema files\n------------\n\n- Use the following publicly accessible schema file when creating the\n jsonl file for importing annotations. This schema file dictates the format of\n the data input files. The structure of the file follows the\n [OpenAPI Schema](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.2.md#schema)\n test.\n\n **Video classification schema file**:\n\n [gs://google-cloud-aiplatform/schema/dataset/ioformat/video_classification_io_format_1.0.0.yaml](https://storage.cloud.google.com/google-cloud-aiplatform/schema/dataset/ioformat/video_classification_io_format_1.0.0.yaml)\n\n \u003cbr /\u003e\n\n #### **Full schema file**\n\n ```\n\n\n title: VideoClassification\n description: \u003e\n Import and export format for importing/exporting videos together with\n classification annotations with time segment. Can be used in\n Dataset.import_schema_uri field.\n type: object\n required:\n - videoGcsUri\n properties:\n videoGcsUri:\n type: string\n description: \u003e\n A Cloud Storage URI pointing to a video. Up to 50 GB in size and\n up to 3 hours in duration. Supported file mime types: `video/mp4`,\n `video/avi`, `video/quicktime`.\n timeSegmentAnnotations:\n type: array\n description: \u003e\n Multiple classification annotations. Each on a time segment of the video.\n items:\n type: object\n description: Annotation with a time segment on media (e.g., video).\n properties:\n displayName:\n type: string\n description: \u003e\n It will be imported as/exported from AnnotationSpec's display name.\n startTime:\n type: string\n description: \u003e\n The start of the time segment. Expressed as a number of seconds as\n measured from the start of the video, with \"s\" appended at the end.\n Fractions are allowed, up to a microsecond precision.\n default: 0s\n endTime:\n type: string\n description: \u003e\n The end of the time segment. Expressed as a number of seconds as\n measured from the start of the video, with \"s\" appended at the end.\n Fractions are allowed, up to a microsecond precision, and \"Infinity\"\n is allowed, which corresponds to the end of the video.\n default: Infinity\n annotationResourceLabels:\n description: Resource labels on the Annotation.\n type: object\n additionalProperties:\n type: string\n dataItemResourceLabels:\n description: Resource labels on the DataItem.\n type: object\n additionalProperties:\n type: string\n ```\n\n \u003cbr /\u003e\n\nInput files\n-----------\n\nThe format of your training data for video classification are as follows.\n| **Note:** When manually managing which videos are used for training or test purposes, be sure each instance of a video in the dataset has been assigned the *same* designation. If there's a contradiction, the dataset reverts to random assignments.\n\nTo import your data, create either a JSONL or CSV file.\n\n\u003cbr /\u003e\n\n### JSONL\n\nJSON on each line: \n\nSee [Classification schema (global)](/vertex-ai/docs/training-overview#video_data) file for details.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n```\n\n\n{\n\t\"videoGcsUri\": \"gs://bucket/filename.ext\",\n\t\"timeSegmentAnnotations\": [{\n\t\t\"displayName\": \"LABEL\",\n\t\t\"startTime\": \"start_time_of_segment\",\n\t\t\"endTime\": \"end_time_of_segment\"\n\t}],\n\t\"dataItemResourceLabels\": {\n\t\t\"aiplatform.googleapis.com/ml_use\": \"train|test\"\n\t}\n}\n```\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n#### Example JSONL - Video classification:\n\n```\n\n\n{\"videoGcsUri\": \"gs://demo/video1.mp4\", \"timeSegmentAnnotations\": [{\"displayName\": \"cartwheel\", \"startTime\": \"1.0s\", \"endTime\": \"12.0s\"}], \"dataItemResourceLabels\": {\"aiplatform.googleapis.com/ml_use\": \"training\"}}\n{\"videoGcsUri\": \"gs://demo/video2.mp4\", \"timeSegmentAnnotations\": [{\"displayName\": \"swing\", \"startTime\": \"4.0s\", \"endTime\": \"9.0s\"}], \"dataItemResourceLabels\": {\"aiplatform.googleapis.com/ml_use\": \"test\"}}\n...\n```\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n### CSV\n\n\nFormat of a row in the CSV:\n\n\u003cbr /\u003e\n\n```\n[ML_USE,]VIDEO_URI,LABEL,START,END\n```\n\n**List of columns**\n\n1. `ML_USE` (Optional). For data split purposes when training a model. Use TRAINING or TEST.\n2. `VIDEO_URI`. This field contains the Cloud Storage URI for the video. Cloud Storage URIs are case-sensitive.\n3. `LABEL`. Labels must start with a letter and only contain letters, numbers, and underscores. You can specify multiple labels for a video by adding multiple rows in the CSV file that each identify the same video segment, with a different label for each row.\n4. `START,END`. These two columns, START and END, respectively, identify the start and end time of the video segment to analyze, in seconds. The start time must be less than the end time. Both values must be non-negative and within the time range of the video. For example, `0.09845,1.36005`. To use the entire content of the video, specify a start time of `0` and an end time of the full-length of the video or \"inf\". For example, `0,inf`.\n\n#### Example CSV - Classification using single label\n\nSingle-label on the same video segment: \n\n```\nTRAINING,gs://YOUR_VIDEO_PATH/vehicle.mp4,mustang,0,5.4\n...\n```\n\n#### Example CSV - multiple labels:\n\nMulti-label on the same video segment: \n\n```\ngs://YOUR_VIDEO_PATH/vehicle.mp4,fiesta,0,8.285\ngs://YOUR_VIDEO_PATH/vehicle.mp4,ranger,0,8.285\ngs://YOUR_VIDEO_PATH/vehicle.mp4,explorer,0,8.285\n...\n```\n\n#### Example CSV - no labels:\n\n\nYou can also provide videos in the data file *without* specifying\nany labels. You must then use the Google Cloud console to apply labels to\nyour data before you train your model. To do so, you only need to\nprovide the Cloud Storage URI for the video followed by three commas,\nas shown in the following example. \n\n```\ngs://YOUR_VIDEO_PATH/vehicle.mp4,,,\n...\n```"]]