Configure automatic captions and translations

This page explains how to configure AI-generated captions and translated captions (that is, translations) for a live stream.

AI-generated captions and translations are supported for HLS and DASH live streams.

Before you begin

This page assumes that you have completed the steps in the Before you begin section of the Quickstart for an HLS live stream or the Quickstart for an MPEG-DASH live stream.

Supported locations

Language codes for AI-generated captions are supported on a per-location basis.

Show locations

Location Supported language codes
asia-northeast1 en-US
asia-south1 en-IN
en-GB
en-US
asia-southeast1 en-US
australia-southeast1 en-AU
europe-west1 da-DK
nl-NL
en-GB
en-US
fr-FR
de-DE
it-IT
es-ES
europe-west2 en-GB
europe-west3 da-DK
nl-NL
en-GB
en-US
fr-FR
de-DE
it-IT
es-ES
northamerica-northeast1 en-CA
fr-CA
us-central1 en-US
pt-BR
es-CO
es-MX
es-US
us-east1 en-US
pt-BR
es-CO
es-MX
es-US
us-west1 en-US
pt-BR
es-CO
es-MX
es-US

The AI-generated translation feature is only supported on us-west1 and europe-west1.

Use either us-west1 or europe-west1 for all API requests on this page.

Create the input endpoint

To create the input endpoint, use the projects.locations.inputs.create method.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location in which to create the input endpoint; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • INPUT_ID: a user-defined identifier for the new input endpoint to create (to which you send your input stream). This value must be 1-63 characters, begin and end with [a-z0-9], and can contain dashes (-) between characters. For example, my-input.

Request JSON body:

{
  "type": "RTMP_PUSH"
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.video.livestream.v1.OperationMetadata",
    "createTime": CREATE_TIME,
    "target": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID",
    "verb": "create",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

Copy the returned OPERATION_ID to use in the next section.

Check for the result

Use the projects.locations.operations.get method to check if the input endpoint has been created. If the response contains "done: false", repeat the command until the response contains "done: true". Creating the first input endpoint in a region may take up to 10 minutes.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your input endpoint is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • OPERATION_ID: the identifier for the operation

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.video.livestream.v1.OperationMetadata",
    "createTime": CREATE_TIME,
    "endTime": END_TIME,
    "target": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID",
    "verb": "create",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.video.livestream.v1.Input",
    "name": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID",
    "createTime": CREATE_TIME,
    "updateTime": UPDATE_TIME,
    "type": "RTMP_PUSH",
    "uri":  INPUT_STREAM_URI, # For example, "rtmp://1.2.3.4/live/b8ebdd94-c8d9-4d88-a16e-b963c43a953b",
    "tier": "HD"
  }
}

Find the uri field and copy the returned INPUT_STREAM_URI to use later in the Send the input stream section.

Create the channel

To create the channel, use the projects.locations.channels.create method.

Note the following in the channel configuration:

  • One ElementaryStream, a TextStream, is used for the captions.

    {
      "key": "webvtt_english_ai",
      "textStream": {
        "codec": "webvtt",
        "displayName": "English (AI captioned)",
        "languageCode": "en-US",
        "mapping": [
          {
            "inputTrack": 1 // audio track number
          }
        ]
      }
    }
    
  • The TextStream codec field must be set to webvtt.

  • The TextStream mapping field uses inputTrack to designate the input audio track to generate captions from. The mapping must not include a inputCeaChannel field, which is only used for user-supplied closed captions.

  • Set the languageCode to the language spoken in the audio track.

  • The elementary stream is used to create a MuxStream with key vtt_english_ai.

    {
      "key": "vtt_english_ai",
      "container": "vtt",
      "elementaryStreams": [
        "webvtt_english_ai"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    }
    

    This mux stream is then referenced in both HLS and DASH manifests.

    {
      "fileName": "main.m3u8",
      "type": "HLS",
      "muxStreams": [
        "mux_video_ts",
        "vtt_english_ai"
      ],
      "maxSegmentCount": 5
    }
    

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location in which to create the channel; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel to create; this value must be 1-63 characters, begin and end with [a-z0-9], and can contain dashes (-) between characters
  • INPUT_ID: the user-defined identifier for the input endpoint
  • BUCKET_NAME: the name of the Cloud Storage bucket you created to hold the live stream manifest and segment files

Request JSON body:

{
  "inputAttachments": [
    {
      "key": "my-input",
      "input": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID"
    }
  ],
  "output": {
    "uri": "gs://BUCKET_NAME"
  },
  "elementaryStreams": [
    {
      "key": "es_video",
      "videoStream": {
        "h264": {
          "profile": "high",
          "widthPixels": 1280,
          "heightPixels": 720,
          "bitrateBps": 3000000,
          "frameRate": 30
        }
      }
    },
    {
      "key": "es_audio",
      "audioStream": {
        "codec": "aac",
        "channelCount": 2,
        "bitrateBps": 160000
      }
    },
    {
      "key": "webvtt_english_ai",
      "textStream": {
        "codec": "webvtt",
        "displayName": "English (AI captioned)",
        "languageCode": "en-US",
        "mapping": [
          {
            "inputTrack": 1
          }
        ]
      }
    }
  ],
  "muxStreams": [
    {
      "key": "mux_video_fmp4",
      "container": "fmp4",
      "elementaryStreams": [
        "es_video"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "mux_audio_fmp4",
      "container": "fmp4",
      "elementaryStreams": [
        "es_audio"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "mux_video_ts",
      "container": "ts",
      "elementaryStreams": [
        "es_video",
        "es_audio"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "vtt_english_ai",
      "container": "vtt",
      "elementaryStreams": [
        "webvtt_english_ai"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    }
  ],
  "manifests": [
    {
      "key": "manifest_dash",
      "fileName": "main.mpd",
      "type": "DASH",
      "muxStreams": [
        "mux_video_fmp4",
        "mux_audio_fmp4",
        "vtt_english_ai"
      ],
      "maxSegmentCount": 5
    },
    {
      "key": "manifest_hls",
      "fileName": "main.m3u8",
      "type": "HLS",
      "muxStreams": [
        "mux_video_ts",
        "vtt_english_ai"
      ],
      "maxSegmentCount": 5
    }
  ]
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Get the channel

You can check for the result of the operation using the new operation ID.

After the channel has been created, use the projects.locations.channels.get method to query the channel state.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your channel is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

The full response contains the following field:

{
  ...
  "streamingState": "STOPPED"
  ...
}

This response indicates that you can now start the channel.

Start the channel

Use the projects.locations.channels.start method to start the channel. A channel must be started before it can accept input streams or generate an output stream.

Starting the first channel in a region takes about 10 minutes.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your channel is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

To determine if the channel has started, get the channel information as done previously. The response should contain the following:

{
  ...
  "streamingState": "AWAITING_INPUT"
  ...
}

Send the input stream

Now that the channel is ready, send an input stream to the input endpoint to generate the live stream. You can download an MP4 (or other TEST_VOD_FILE) with captions and use ffmpeg to send it to the input endpoint.

Open a new terminal window. Run the following command, using the INPUT_STREAM_URI from the Check for the result section:

ffmpeg -re -stream_loop -1 -i "TEST_VOD_FILE" \
  -c:v copy -c:a aac -strict 2 -f "flv" "INPUT_STREAM_URI"

Verify the captions in the output manifest

Run the following command to see the contents of the generated HLS manifest:

gcloud storage cat gs://BUCKET_NAME/main.m3u8

The AI-generated webvtt English captions show up in the output manifest similar to the following:

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="sub",LANGUAGE="en-US",NAME="English (AI captioned)",AUTOSELECT=YES,DEFAULT=YES,FORCED=NO,URI="vtt_english_ai/index-1.m3u8"

Stop the channel

You must stop a channel before you update the channel configuration.

Use the projects.locations.channels.stop method to stop the channel.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your channel is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Update the channel configuration for a translation

Update the channel configuration to include translated captions (that is, a translation).

Note the following additions to the channel configuration:

  • One ElementaryStream, a TextStream, is used for the captions for a single translated language.

    {
      "key": "webvtt_spanish_ai",
      "textStream": {
        "codec": "webvtt",
        "displayName": "Spanish (AI captioned)",
        "languageCode": "es-MX",
        "mapping": [
          {
            "inputTrack": 1, // audio track number
            "fromLanguageCode": "en-US" // original audio in English
          }
        ]
      }
    }
    
  • Set languageCode to the chosen translated language.

  • Set fromLanguageCode to the original source language in the audio track.

  • The elementary stream is used to create a MuxStream with key vtt_spanish_ai.

    {
      "key": "vtt_spanish_ai",
      "container": "vtt",
      "elementaryStreams": [
        "webvtt_spanish_ai"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    }
    

    This mux stream is then referenced in both HLS and DASH manifests.

    {
      "fileName": "main.m3u8",
      "type": "HLS",
      "muxStreams": [
        "mux_video_ts",
        "vtt_english_ai",
        "vtt_spanish_ai"
      ],
      "maxSegmentCount": 5
    }
    

To update the channel, use the projects.locations.channels.patch method.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location in which to create the channel; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel to create; this value must be 1-63 characters, begin and end with [a-z0-9], and can contain dashes (-) between characters

Request JSON body:

{
  "inputAttachments": [
    {
      "key": "my-input",
      "input": "projects/PROJECT_NUMBER/locations/LOCATION/inputs/INPUT_ID"
    }
  ],
  "output": {
    "uri": "gs://BUCKET_NAME"
  },
  "elementaryStreams": [
    {
      "key": "es_video",
      "videoStream": {
        "h264": {
          "profile": "high",
          "widthPixels": 1280,
          "heightPixels": 720,
          "bitrateBps": 3000000,
          "frameRate": 30
        }
      }
    },
    {
      "key": "es_audio",
      "audioStream": {
        "codec": "aac",
        "channelCount": 2,
        "bitrateBps": 160000
      }
    },
    {
      "key": "webvtt_english_ai",
      "textStream": {
        "codec": "webvtt",
        "displayName": "English (AI captioned)",
        "languageCode": "en-US",
        "mapping": [
          {
            "inputTrack": 1
          }
        ]
      }
    },
    {
      "key": "webvtt_spanish_ai",
      "textStream": {
        "codec": "webvtt",
        "displayName": "Spanish (AI captioned)",
        "languageCode": "es-MX",
        "mapping": [
          {
            "inputTrack": 1,
            "fromLanguageCode": "en-US"
          }
        ]
      }
    }
  ],
  "muxStreams": [
    {
      "key": "mux_video_fmp4",
      "container": "fmp4",
      "elementaryStreams": [
        "es_video"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "mux_audio_fmp4",
      "container": "fmp4",
      "elementaryStreams": [
        "es_audio"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "mux_video_ts",
      "container": "ts",
      "elementaryStreams": [
        "es_video",
        "es_audio"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "vtt_english_ai",
      "container": "vtt",
      "elementaryStreams": [
        "webvtt_english_ai"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    },
    {
      "key": "vtt_spanish_ai",
      "container": "vtt",
      "elementaryStreams": [
        "webvtt_spanish_ai"
      ],
      "segmentSettings": {
        "segmentDuration": "2s"
      }
    }
  ],
  "manifests": [
    {
      "key": "manifest_dash",
      "fileName": "main.mpd",
      "type": "DASH",
      "muxStreams": [
        "mux_video_fmp4",
        "mux_audio_fmp4",
        "vtt_english_ai",
        "vtt_spanish_ai"
      ],
      "maxSegmentCount": 5
    },
    {
      "key": "manifest_hls",
      "fileName": "main.m3u8",
      "type": "HLS",
      "muxStreams": [
        "mux_video_ts",
        "vtt_english_ai",
        "vtt_spanish_ai"
      ],
      "maxSegmentCount": 5
    }
  ]
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

You can check for the result of the operation using the new operation ID.

Restart the channel and resend the input stream

As done previously, start the channel and send the input stream again.

Verify the translation in the output manifest

Run the following command to see the contents of the generated HLS manifest:

gcloud storage cat gs://BUCKET_NAME/main.m3u8

The AI-generated webvtt English and Spanish captions show up in the output manifest similar to the following:

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="sub",LANGUAGE="en-US",NAME="English (AI captioned)",AUTOSELECT=YES,DEFAULT=YES,FORCED=NO,URI="vtt_english_ai/index-1.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="sub",LANGUAGE="es-MX",NAME="Spanish (AI captioned)",AUTOSELECT=NO,DEFAULT=NO,FORCED=NO,URI="vtt_spanish_ai/index-1.m3u8"

Advanced configurations

Set AutoTranscriptionConfig at the channel level to further tune the AI-generated text streams to your needs.

Caption display timing

By default, AI-generated captions are displayed asynchronously with the audio and video. Set the DisplayTiming field in AutoTranscriptionConfig to SYNC to display them synchronously.

{
  "autoTranscriptionConfig": {
    "displayTiming": "SYNC"
  }
}

Displaying captions synchronously decreases the viewing latency between audio and text but increases the overall end-to-end media latency.

Quality presets

Use the QualityPreset field in AutoTranscriptionConfig to configure the quality preferences for AI-generated text streams.

Possible values are:

  • LOW_LATENCY (low latency but possibly low accuracy)
  • BALANCED_QUALITY (default)
  • IMPROVED_QUALITY (high accuracy but high latency)

For example, you can further reduce the latency of text stream generation by setting:

{
  "autoTranscriptionConfig" : {
    "qualityPreset": "LOW_LATENCY"
  }
}

Clean up

Stop the channel

Use the projects.locations.channels.stop method to stop the channel. You must stop the channel before you can delete it.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your channel is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Stop the input stream

If you used ffmpeg to send the input stream, the connection is automatically broken after you stop the channel.

Delete the channel

Use the projects.locations.channels.delete method to delete the channel. You must delete the channel before you can delete the input endpoint that is used by the channel.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your channel is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • CHANNEL_ID: a user-defined identifier for the channel

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Delete the input endpoint

Use the projects.locations.inputs.delete method to delete the input endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_NUMBER: your Google Cloud project number; this is located in the Project number field on the IAM Settings page
  • LOCATION: the location where your input endpoint is located; use one of the supported regions
    Show locations
    • us-central1
    • us-east1
    • us-east4
    • us-west1
    • us-west2
    • northamerica-northeast1
    • southamerica-east1
    • asia-east1
    • asia-east2
    • asia-south1
    • asia-northeast1
    • asia-southeast1
    • australia-southeast1
    • europe-north1
    • europe-west1
    • europe-west2
    • europe-west3
    • europe-west4
  • INPUT_ID: the user-defined identifier for the input endpoint

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Delete the Cloud Storage bucket

  1. In the Google Cloud console, go to the Cloud Storage Browser page.

    Go to the Cloud Storage Browser page

  2. Select the checkbox next to the bucket that you created.

  3. Click Delete.

  4. In the dialog window that appears, click Delete to delete the bucket and its contents.