Choose a transcription function
This document provides a comparison of the transcription functions
available in BigQuery ML, which are
ML.GENERATE_TEXT
and
ML.TRANSCRIBE
.
You can use the information in this document to help you decide which function to use in cases where the functions have overlapping capabilities.
At a high level, the difference between these functions is as follows:
ML.GENERATE_TEXT
is a good choice for transcription of audio clips that are 10 minutes or shorter, and you can also use it to perform natural language processing (NLP) tasks. Audio transcription withML.GENERATE_TEXT
is less expensive than withML.TRANSCRIBE
when you use thegemini-1.5-flash
model.ML.TRANSCRIBE
is a good choice for performing transcription on audio clips that are longer than 10 minutes. It also supports a wider range of languages thanML.GENERATE_TEXT
.
Supported models
Supported models are as follows:
ML.GENERATE_TEXT
: you can use a subset of the Vertex AI Gemini models to generate text. For more information on supported models, see theML.GENERATE_TEXT
syntax.ML.TRANSCRIBE
: you use the default model of the Speech-to-Text API. Using the Document AI API gives you access to transcription with the Chirp speech model.
Supported tasks
Supported tasks are as follows:
ML.GENERATE_TEXT
: you can perform audio transcription and natural language processing (NLP) tasks.ML.TRANSCRIBE
: you can perform audio transcription.
Pricing
Pricing is as follows:
ML.GENERATE_TEXT
: for pricing of the Vertex AI models that you use with this function, see Vertex AI pricing. Supervised tuning of supported models is charged at dollars per node hour. For more information, see Vertex AI custom training pricing.ML.TRANSCRIBE
: For pricing of the Cloud AI service that you use with this function, see Speech-to-Text API pricing.
Supervised tuning
Supervised tuning support is as follows:
ML.GENERATE_TEXT
: supervised tuning is supported for some models.ML.TRANSCRIBE
: supervised tuning isn't supported.
Queries per minute (QPM) limit
QPM limits are as follows:
ML.GENERATE_TEXT
: 60 QPM in the defaultus-central1
region forgemini-1.5-pro
models, and 200 QPM in the defaultus-central1
region forgemini-1.5-flash
models. For more information, see Generative AI on Vertex AI quotas.ML.TRANSCRIBE
: 900 QPM per project. For more information, see Quotas and limits.
To increase your quota, see Request a higher quota.
Token limit
Token limits are as follows:
ML.GENERATE_TEXT
: 700 input tokens, and 8196 output tokens. This output token limit means thatML.GENERATE_TEXT
has a limit of approximately 39 minutes for an individual audio clip.ML.TRANSCRIBE
: No token limit. However, this function does have a limit of 480 minutes for an individual audio clip.
Supported languages
Supported languages are as follows:
ML.GENERATE_TEXT
: supports the same languages as Gemini.ML.TRANSCRIBE
: supports all of the Speech-to-Text supported languages.
Region availability
Region availability is as follows:
ML.GENERATE_TEXT
: available in all Generative AI for Vertex AI regions.ML.TRANSCRIBE
: available in theEU
andUS
multi-regions for all speech recognizers.