Choose a transcription function

This document provides a comparison of the transcription functions available in BigQuery ML, which are ML.GENERATE_TEXT and ML.TRANSCRIBE.

You can use the information in this document to help you decide which function to use in cases where the functions have overlapping capabilities.

At a high level, the difference between these functions is as follows:

  • ML.GENERATE_TEXT is a good choice for transcription of audio clips that are 10 minutes or shorter, and you can also use it to perform natural language processing (NLP) tasks. Audio transcription with ML.GENERATE_TEXT is less expensive than with ML.TRANSCRIBE when you use the gemini-1.5-flash model.

  • ML.TRANSCRIBE is a good choice for performing transcription on audio clips that are longer than 10 minutes. It also supports a wider range of languages than ML.GENERATE_TEXT.

Supported models

Supported models are as follows:

Supported tasks

Supported tasks are as follows:

  • ML.GENERATE_TEXT: you can perform audio transcription and natural language processing (NLP) tasks.
  • ML.TRANSCRIBE: you can perform audio transcription.

Pricing

Pricing is as follows:

Supervised tuning

Supervised tuning support is as follows:

  • ML.GENERATE_TEXT: supervised tuning is supported for some models.
  • ML.TRANSCRIBE: supervised tuning isn't supported.

Queries per minute (QPM) limit

QPM limits are as follows:

  • ML.GENERATE_TEXT: 60 QPM in the default us-central1 region for gemini-1.5-pro models, and 200 QPM in the default us-central1 region for gemini-1.5-flash models. For more information, see Generative AI on Vertex AI quotas.
  • ML.TRANSCRIBE: 900 QPM per project. For more information, see Quotas and limits.

To increase your quota, see Request a higher quota.

Token limit

Token limits are as follows:

  • ML.GENERATE_TEXT: 700 input tokens, and 8196 output tokens. This output token limit means that ML.GENERATE_TEXT has a limit of approximately 39 minutes for an individual audio clip.
  • ML.TRANSCRIBE: No token limit. However, this function does have a limit of 480 minutes for an individual audio clip.

Supported languages

Supported languages are as follows:

Region availability

Region availability is as follows:

  • ML.GENERATE_TEXT: available in all Generative AI for Vertex AI regions.
  • ML.TRANSCRIBE: available in the EU and US multi-regions for all speech recognizers.