Choose a transcription function

This document provides a comparison of the transcription functions available in BigQuery ML, which are ML.GENERATE_TEXT and ML.TRANSCRIBE.

You can use the information in this document to help you decide which function to use in cases where the functions have overlapping capabilities.

At a high level, the difference between these functions is as follows:

ML.GENERATE_TEXT is a good choice for transcription of audio clips that are 10 minutes or shorter, and you can also use it to perform natural language processing (NLP) tasks. Audio transcription with ML.GENERATE_TEXT is less expensive than with ML.TRANSCRIBE when you use the gemini-1.5-flash model.
ML.TRANSCRIBE is a good choice for performing transcription on audio clips that are longer than 10 minutes. It also supports a wider range of languages than ML.GENERATE_TEXT.

Supported models

Supported models are as follows:

ML.GENERATE_TEXT: you can use a subset of the Vertex AI Gemini models to generate text. For more information on supported models, see the ML.GENERATE_TEXT syntax.
ML.TRANSCRIBE: you use the default model of the Speech-to-Text API. Using the Document AI API gives you access to transcription with the Chirp speech model.

Supported tasks are as follows:

ML.GENERATE_TEXT: you can perform audio transcription and natural language processing (NLP) tasks.
ML.TRANSCRIBE: you can perform audio transcription.

Pricing is as follows:

ML.GENERATE_TEXT: for pricing of the Vertex AI models that you use with this function, see Vertex AI pricing. Supervised tuning of supported models is charged at dollars per node hour. For more information, see Vertex AI custom training pricing.
ML.TRANSCRIBE: For pricing of the Cloud AI service that you use with this function, see Speech-to-Text API pricing.

Supervised tuning support is as follows:

QPM limits are as follows:

ML.GENERATE_TEXT: 60 QPM in the default us-central1 region for gemini-1.5-pro models, and 200 QPM in the default us-central1 region for gemini-1.5-flash models. For more information, see Generative AI on Vertex AI quotas.
ML.TRANSCRIBE: 900 QPM per project. For more information, see Quotas and limits.

To increase your quota, see Request a quota adjustment.

Token limits are as follows:

ML.GENERATE_TEXT: 700 input tokens, and 8196 output tokens. This output token limit means that ML.GENERATE_TEXT has a limit of approximately 39 minutes for an individual audio clip.
ML.TRANSCRIBE: No token limit. However, this function does have a limit of 480 minutes for an individual audio clip.

Supported languages are as follows:

Region availability is as follows:

ML.GENERATE_TEXT: available in all Generative AI for Vertex AI regions.
ML.TRANSCRIBE: available in the EU and US multi-regions for all speech recognizers.