This page explains how to configure AI-generated captions and translated captions (that is, translations) for a live stream.
AI-generated captions and translations are supported for HLS and DASH live streams.
Before you begin
This page assumes that you have completed the steps in the Before you begin section of the Quickstart for an HLS live stream or the Quickstart for an MPEG-DASH live stream.
Supported locations
Language codes for AI-generated captions are supported on a per-location basis.
| Location | Supported language codes |
|---|---|
asia-northeast1 |
en-US |
asia-south1 |
en-INen-GBen-US |
asia-southeast1 |
en-US |
australia-southeast1 |
en-AU |
europe-west1 |
da-DKnl-NLen-GBen-USfr-FRde-DEit-ITes-ES |
europe-west2 |
en-GB |
europe-west3 |
da-DKnl-NLen-GBen-US fr-FRde-DEit-ITes-ES |
northamerica-northeast1 |
en-CAfr-CA |
us-central1 |
en-USpt-BRes-COes-MXes-US |
us-east1 |
en-USpt-BRes-COes-MXes-US |
us-west1 |
en-USpt-BRes-COes-MXes-US |
The AI-generated translation feature is only supported on us-west1 and
europe-west1. All supported language codes for AI-generated captions are
supported as source and target language codes for AI-generated translations.
Use either us-west1 or europe-west1 for all API requests on this page.
Create the input endpoint
To create the input endpoint, use the
projects.locations.inputs.create method.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location in which to create the input endpoint; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
INPUT_ID: a user-defined identifier for the new input endpoint to create (to which you send your input stream). This value must be 1-63 characters, begin and end with[a-z0-9], and can contain dashes (-) between characters. For example,my-input.
Request JSON body:
{
"type": "RTMP_PUSH"
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.video.livestream.v1.OperationMetadata",
"createTime": CREATE_TIME,
"target": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID",
"verb": "create",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
Copy the returned OPERATION_ID to use in the next section.
Check for the result
Use the projects.locations.operations.get method to check if
the input endpoint has been created. If the response contains "done: false",
repeat the command until the response contains "done: true". Creating the
first input endpoint in a region may take up to 10 minutes.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your input endpoint is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
OPERATION_ID: the identifier for the operation
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{
"name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.video.livestream.v1.OperationMetadata",
"createTime": CREATE_TIME,
"endTime": END_TIME,
"target": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID",
"verb": "create",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.video.livestream.v1.Input",
"name": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID",
"createTime": CREATE_TIME,
"updateTime": UPDATE_TIME,
"type": "RTMP_PUSH",
"uri": INPUT_STREAM_URI, # For example, "rtmp://1.2.3.4/live/b8ebdd94-c8d9-4d88-a16e-b963c43a953b",
"tier": "HD"
}
}
Find the uri field and copy the returned
INPUT_STREAM_URI to use later
in the Send the input stream section.
Create the channel
To create the channel, use the
projects.locations.channels.create method.
Note the following in the channel configuration:
One
ElementaryStream, aTextStream, is used for the captions.{ "key": "webvtt_english_ai", "textStream": { "codec": "webvtt", "displayName": "English (AI captioned)", "languageCode": "en-US", "mapping": [ { "inputTrack": 1 // audio track number } ] } }The
TextStreamcodecfield must be set towebvtt.The
TextStreammappingfield usesinputTrackto designate the input audio track to generate captions from. The mapping must not include ainputCeaChannelfield, which is only used for user-supplied closed captions.Set the
languageCodeto the language spoken in the audio track.The elementary stream is used to create a
MuxStreamwith keyvtt_english_ai.{ "key": "vtt_english_ai", "container": "vtt", "elementaryStreams": [ "webvtt_english_ai" ], "segmentSettings": { "segmentDuration": "2s" } }This mux stream is then referenced in both HLS and DASH manifests.
{ "fileName": "main.m3u8", "type": "HLS", "muxStreams": [ "mux_video_ts", "vtt_english_ai" ], "maxSegmentCount": 5 }
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location in which to create the channel; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel to create; this value must be 1-63 characters, begin and end with[a-z0-9], and can contain dashes (-) between charactersINPUT_ID: the user-defined identifier for the input endpointBUCKET_NAME: the name of the Cloud Storage bucket you created to hold the live stream manifest and segment files
Request JSON body:
{
"inputAttachments": [
{
"key": "my-input",
"input": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID"
}
],
"output": {
"uri": "gs://BUCKET_NAME"
},
"elementaryStreams": [
{
"key": "es_video",
"videoStream": {
"h264": {
"profile": "high",
"widthPixels": 1280,
"heightPixels": 720,
"bitrateBps": 3000000,
"frameRate": 30
}
}
},
{
"key": "es_audio",
"audioStream": {
"codec": "aac",
"channelCount": 2,
"bitrateBps": 160000
}
},
{
"key": "webvtt_english_ai",
"textStream": {
"codec": "webvtt",
"displayName": "English (AI captioned)",
"languageCode": "en-US",
"mapping": [
{
"inputTrack": 1
}
]
}
}
],
"muxStreams": [
{
"key": "mux_video_fmp4",
"container": "fmp4",
"elementaryStreams": [
"es_video"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "mux_audio_fmp4",
"container": "fmp4",
"elementaryStreams": [
"es_audio"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "mux_video_ts",
"container": "ts",
"elementaryStreams": [
"es_video",
"es_audio"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "vtt_english_ai",
"container": "vtt",
"elementaryStreams": [
"webvtt_english_ai"
],
"segmentSettings": {
"segmentDuration": "2s"
}
}
],
"manifests": [
{
"key": "manifest_dash",
"fileName": "main.mpd",
"type": "DASH",
"muxStreams": [
"mux_video_fmp4",
"mux_audio_fmp4",
"vtt_english_ai"
],
"maxSegmentCount": 5
},
{
"key": "manifest_hls",
"fileName": "main.m3u8",
"type": "HLS",
"muxStreams": [
"mux_video_ts",
"vtt_english_ai"
],
"maxSegmentCount": 5
}
]
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Get the channel
You can check for the result of the operation using the new operation ID.
After the channel has been created, use the
projects.locations.channels.get method to query the channel
state.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your channel is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
The full response contains the following field:
{
...
"streamingState": "STOPPED"
...
}
This response indicates that you can now start the channel.
Start the channel
Use the projects.locations.channels.start method to start the
channel. A channel must be started before it can accept input streams or
generate an output stream.
Starting the first channel in a region takes about 10 minutes.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your channel is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
To determine if the channel has started, get the channel information as done previously. The response should contain the following:
{
...
"streamingState": "AWAITING_INPUT"
...
}
Send the input stream
Now that the channel is ready, send an input stream to the input
endpoint to generate the live stream. You can download an MP4 (or other
TEST_VOD_FILE) with captions and
use ffmpeg to send it to the input endpoint.
Open a new terminal window. Run the following command, using the INPUT_STREAM_URI from the Check for the result section:
ffmpeg -re -stream_loop -1 -i "TEST_VOD_FILE" \
-c:v copy -c:a aac -strict 2 -f "flv" "INPUT_STREAM_URI"
Verify the captions in the output manifest
Run the following command to see the contents of the generated HLS manifest:
gcloud storage cat gs://BUCKET_NAME/main.m3u8
The AI-generated webvtt English captions show up in the output manifest similar to the following:
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="sub",LANGUAGE="en-US",NAME="English (AI captioned)",AUTOSELECT=YES,DEFAULT=YES,FORCED=NO,URI="vtt_english_ai/index-1.m3u8"
Stop the channel
You must stop a channel before you update the channel configuration.
Use the
projects.locations.channels.stop method to stop the channel.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your channel is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Update the channel configuration for a translation
Update the channel configuration to include translated captions (that is, a translation).
Note the following additions to the channel configuration:
One
ElementaryStream, aTextStream, is used for the captions for a single translated language.{ "key": "webvtt_spanish_ai", "textStream": { "codec": "webvtt", "displayName": "Spanish (AI captioned)", "languageCode": "es-MX", "mapping": [ { "inputTrack": 1, // audio track number "fromLanguageCode": "en-US" // original audio in English } ] } }Set
languageCodeto the chosen translated language.Set
fromLanguageCodeto the original source language in the audio track.The elementary stream is used to create a
MuxStreamwith keyvtt_spanish_ai.{ "key": "vtt_spanish_ai", "container": "vtt", "elementaryStreams": [ "webvtt_spanish_ai" ], "segmentSettings": { "segmentDuration": "2s" } }This mux stream is then referenced in both HLS and DASH manifests.
{ "fileName": "main.m3u8", "type": "HLS", "muxStreams": [ "mux_video_ts", "vtt_english_ai", "vtt_spanish_ai" ], "maxSegmentCount": 5 }
To update the channel, use the
projects.locations.channels.patch method.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location in which to create the channel; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel to create; this value must be 1-63 characters, begin and end with[a-z0-9], and can contain dashes (-) between characters
Request JSON body:
{
"inputAttachments": [
{
"key": "my-input",
"input": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID"
}
],
"output": {
"uri": "gs://BUCKET_NAME"
},
"elementaryStreams": [
{
"key": "es_video",
"videoStream": {
"h264": {
"profile": "high",
"widthPixels": 1280,
"heightPixels": 720,
"bitrateBps": 3000000,
"frameRate": 30
}
}
},
{
"key": "es_audio",
"audioStream": {
"codec": "aac",
"channelCount": 2,
"bitrateBps": 160000
}
},
{
"key": "webvtt_english_ai",
"textStream": {
"codec": "webvtt",
"displayName": "English (AI captioned)",
"languageCode": "en-US",
"mapping": [
{
"inputTrack": 1
}
]
}
},
{
"key": "webvtt_spanish_ai",
"textStream": {
"codec": "webvtt",
"displayName": "Spanish (AI captioned)",
"languageCode": "es-MX",
"mapping": [
{
"inputTrack": 1,
"fromLanguageCode": "en-US"
}
]
}
}
],
"muxStreams": [
{
"key": "mux_video_fmp4",
"container": "fmp4",
"elementaryStreams": [
"es_video"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "mux_audio_fmp4",
"container": "fmp4",
"elementaryStreams": [
"es_audio"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "mux_video_ts",
"container": "ts",
"elementaryStreams": [
"es_video",
"es_audio"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "vtt_english_ai",
"container": "vtt",
"elementaryStreams": [
"webvtt_english_ai"
],
"segmentSettings": {
"segmentDuration": "2s"
}
},
{
"key": "vtt_spanish_ai",
"container": "vtt",
"elementaryStreams": [
"webvtt_spanish_ai"
],
"segmentSettings": {
"segmentDuration": "2s"
}
}
],
"manifests": [
{
"key": "manifest_dash",
"fileName": "main.mpd",
"type": "DASH",
"muxStreams": [
"mux_video_fmp4",
"mux_audio_fmp4",
"vtt_english_ai",
"vtt_spanish_ai"
],
"maxSegmentCount": 5
},
{
"key": "manifest_hls",
"fileName": "main.m3u8",
"type": "HLS",
"muxStreams": [
"mux_video_ts",
"vtt_english_ai",
"vtt_spanish_ai"
],
"maxSegmentCount": 5
}
]
}
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
You can check for the result of the operation using the new operation ID.
Restart the channel and resend the input stream
As done previously, start the channel and send the input stream again.
Verify the translation in the output manifest
Run the following command to see the contents of the generated HLS manifest:
gcloud storage cat gs://BUCKET_NAME/main.m3u8
The AI-generated webvtt English and Spanish captions show up in the output manifest similar to the following:
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="sub",LANGUAGE="en-US",NAME="English (AI captioned)",AUTOSELECT=YES,DEFAULT=YES,FORCED=NO,URI="vtt_english_ai/index-1.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="sub",LANGUAGE="es-MX",NAME="Spanish (AI captioned)",AUTOSELECT=NO,DEFAULT=NO,FORCED=NO,URI="vtt_spanish_ai/index-1.m3u8"
Configure AI-generated transcripts for distribution
Before configuring AI-generated captions and translations for distribution, familiarize yourself with how to distribute live streams to remote endpoints.
To distribute AI-generated captions and translations to remote endpoints,
don't reference the text streams in the webvtt format
that were configured previously when creating channel with AI-generated captions
and updating channel for translation.
Text streams for distribution must be embedded captions, with the
codec set to cea608 or cea708.
In the channel configuration, consider the following:
One
ElementaryStream, aTextStream, is used for the captions, and anotherElementaryStream, aTextStream, is used for the translation.{ "key": "embedded_english_ai", "textStream": { "codec": "cea608", "displayName": "English (AI captioned)", "languageCode": "en-US", "outputCeaChannel": "CC1", // closed caption channel that will contain AI-generated English captions "mapping": [ { "inputTrack": 1 // audio track number } ] } } { "key": "embedded_spanish_ai", "textStream": { "codec": "cea608", "displayName": "Spanish (AI captioned)", "languageCode": "es-MX", "outputCeaChannel": "CC2", // closed caption channel that will contain AI-generated Spanish captions "mapping": [ { "inputTrack": 1, // audio track number "fromLanguageCode": "en-US" // original audio in English } ] } }The
TextStreamcodecfield must be set to eithercea608orcea708.The
TextStreammappingfield usesinputTrackto designate the input audio track to generate captions from. The mapping must not include ainputCeaChannelfield, which is only used for user-supplied closed captions.Set the
outputCeaChannelto the channel of closed caption to contain AI-generated transcripts in the distribution stream.Set the
languageCodeto the language spoken in the audio track.The elementary streams are included in the
elementaryStreamsfield of theDistributionStreamobject.{ "distributionStreams": [ { "key": "ds1", "container": "ts", // to be used with an SRT distribution "elementaryStreams": [ "es_video", "es_audio", "embedded_english_ai", // distribute both English "embedded_spanish_ai" // and Spanish captions ] }, { "key": "ds2", "container": "flv", // to be used with an RTMP distribution "elementaryStreams": [ "es_video", "es_audio", "embedded_spanish_ai" // only distribute Spanish captions ] } ] }
Complete channel creation by following distribute live stream to remote endpoint guide.
Advanced configurations
Set AutoTranscriptionConfig at the channel level
to further tune the AI-generated text streams to your needs.
Caption display timing
By default, AI-generated captions are displayed asynchronously with the audio
and video. Set the DisplayTiming
field in AutoTranscriptionConfig to SYNC to display them synchronously.
{
"autoTranscriptionConfig": {
"displayTiming": "SYNC"
}
}
Displaying captions synchronously decreases the viewing latency between audio and text but increases the overall end-to-end media latency.
Quality presets
Use the QualityPreset field in
AutoTranscriptionConfig to configure the quality preferences for AI-generated
text streams.
For example, you can further reduce the latency of text stream generation by setting:
{
"autoTranscriptionConfig" : {
"qualityPreset": "LOW_LATENCY"
}
}
Clean up
Stop the channel
Use the
projects.locations.channels.stop method to stop the channel.
You must stop the channel before you can delete it.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your channel is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Stop the input stream
If you used ffmpeg to send the input stream, the connection is automatically
broken after you stop the channel.
Delete the channel
Use the
projects.locations.channels.delete method to delete the
channel. You must delete the channel before you can delete the input endpoint
that is used by the channel.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your channel is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
CHANNEL_ID: a user-defined identifier for the channel
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Delete the input endpoint
Use the
projects.locations.inputs.delete method to delete the input
endpoint.
Before using any of the request data, make the following replacements:
PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings pageLOCATION: the location where your input endpoint is located; use one of the supported regionsShow locationsus-central1us-east1us-east4us-west1us-west2northamerica-northeast1southamerica-east1asia-east1asia-east2asia-south1asia-northeast1asia-southeast1australia-southeast1europe-north1europe-west1europe-west2europe-west3europe-west4
INPUT_ID: the user-defined identifier for the input endpoint
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Delete the Cloud Storage bucket
In the Google Cloud console, go to the Cloud Storage Browser page.
Select the checkbox next to the bucket that you created.
Click Delete.
In the dialog window that appears, click Delete to delete the bucket and its contents.