Method: projects.locations.recognizers.batchRecognize

Performs batch asynchronous speech recognition: send a request with N audio files and receive a long running operation that can be polled to see when the transcriptions are finished.

HTTP request

POST https://{endpoint}/v2/{recognizer=projects/*/locations/*/recognizers/*}:batchRecognize

Where {endpoint} is one of the supported service endpoints.

The URLs use gRPC Transcoding syntax.

Path parameters

Parameters
recognizer

string

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

Request body

The request body contains data with the following structure:

JSON representation
{
  "config": {
    object (RecognitionConfig)
  },
  "configMask": string,
  "files": [
    {
      object (BatchRecognizeFileMetadata)
    }
  ],
  "recognitionOutputConfig": {
    object (RecognitionOutputConfig)
  },
  "processingStrategy": enum (ProcessingStrategy)
}
Fields
config

object (RecognitionConfig)

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the configMask field can be used to override parts of the defaultRecognitionConfig of the Recognizer resource.

configMask

string (FieldMask format)

The list of fields in config that override the values in the defaultRecognitionConfig of the recognizer during this recognition request. If no mask is provided, all given fields in config override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, config completely overrides and replaces the config in the recognizer for this recognition request.

This is a comma-separated list of fully qualified names of fields. Example: "user.displayName,photo".

files[]

object (BatchRecognizeFileMetadata)

Audio files with file metadata for ASR. The maximum number of files allowed to be specified is 15.

recognitionOutputConfig

object (RecognitionOutputConfig)

Configuration options for where to output the transcripts of each file.

processingStrategy

enum (ProcessingStrategy)

Processing strategy to use for this request.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the recognizer resource:

  • speech.recognizers.recognize

For more information, see the IAM documentation.

BatchRecognizeFileMetadata

Metadata about a single file in a batch for recognizers.batchRecognize.

JSON representation
{
  "config": {
    object (RecognitionConfig)
  },
  "configMask": string,

  // Union field audio_source can be only one of the following:
  "uri": string
  // End of list of possible types for union field audio_source.
}
Fields
config

object (RecognitionConfig)

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the configMask field can be used to override parts of the defaultRecognitionConfig of the Recognizer resource as well as the config at the request level.

configMask

string (FieldMask format)

The list of fields in config that override the values in the defaultRecognitionConfig of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in config override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, config completely overrides and replaces the config in the recognizer for this recognition request.

This is a comma-separated list of fully qualified names of fields. Example: "user.displayName,photo".

Union field audio_source. The audio source, which is a Google Cloud Storage URI. audio_source can be only one of the following:
uri

string

Cloud Storage URI for the audio file.

RecognitionOutputConfig

Configuration options for the output(s) of recognition.

JSON representation
{
  "outputFormatConfig": {
    object (OutputFormatConfig)
  },

  // Union field output can be only one of the following:
  "gcsOutputConfig": {
    object (GcsOutputConfig)
  },
  "inlineResponseConfig": {
    object (InlineOutputConfig)
  }
  // End of list of possible types for union field output.
}
Fields
outputFormatConfig

object (OutputFormatConfig)

Optional. Configuration for the format of the results stored to output. If unspecified transcripts will be written in the NATIVE format only.

Union field output.

output can be only one of the following:

gcsOutputConfig

object (GcsOutputConfig)

If this message is populated, recognition results are written to the provided Google Cloud Storage URI.

inlineResponseConfig

object (InlineOutputConfig)

If this message is populated, recognition results are provided in the BatchRecognizeResponse message of the Operation when completed. This is only supported when calling recognizers.batchRecognize with just one audio file.

GcsOutputConfig

Output configurations for Cloud Storage.

JSON representation
{
  "uri": string
}
Fields
uri

string

The Cloud Storage URI prefix with which recognition results will be written.

InlineOutputConfig

This type has no fields.

Output configurations for inline response.

OutputFormatConfig

Configuration for the format of the results stored to output.

JSON representation
{
  "native": {
    object (NativeOutputFileFormatConfig)
  },
  "vtt": {
    object (VttOutputFileFormatConfig)
  },
  "srt": {
    object (SrtOutputFileFormatConfig)
  }
}
Fields
native

object (NativeOutputFileFormatConfig)

Configuration for the native output format. If this field is set or if no other output format field is set, then transcripts will be written to the sink in the native format.

vtt

object (VttOutputFileFormatConfig)

Configuration for the VTT output format. If this field is set, then transcripts will be written to the sink in the VTT format.

srt

object (SrtOutputFileFormatConfig)

Configuration for the SRT output format. If this field is set, then transcripts will be written to the sink in the SRT format.

NativeOutputFileFormatConfig

This type has no fields.

Output configurations for serialized BatchRecognizeResults protos.

VttOutputFileFormatConfig

This type has no fields.

Output configurations for WebVTT formatted subtitle file.

SrtOutputFileFormatConfig

This type has no fields.

Output configurations SubRip Text formatted subtitle file.

ProcessingStrategy

Possible processing strategies for batch requests.

Enums
PROCESSING_STRATEGY_UNSPECIFIED Default value for the processing strategy. The request is processed as soon as its received.
DYNAMIC_BATCHING If selected, processes the request during lower utilization periods for a price discount. The request is fulfilled within 24 hours.