Convert speech to text

This document shows you how to use Vertex AI Studio to convert speech to text. The speech-to-text feature in Vertex AI Studio is a quick way to transcribe short audio files. For more advanced features and higher limits, you can use the dedicated Speech-to-Text service.

The following table compares the two services.

Tool Description Use Case
Vertex AI Studio A quick way to transcribe short audio files using the Chirp model directly in the studio. Best for quick tests and transcribing audio files under 60 seconds.
Speech-to-Text A dedicated service with more models, advanced features, and support for much longer audio files. Suitable for production workloads and transcribing files up to 8 hours long.

To learn how to convert text to speech, see Convert text to speech.

Convert speech to text

To convert speech to text, follow these steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Click Generate speech.

  3. Select the Speech-to-text tab.

  4. In the Speech section, click Browse to select the audio file that you want to convert to text.

  5. In the Language list, select the language of the speech in the audio file.

  6. Click Submit.

    The converted text appears in the Text field.

Limitations

  • Audio files can be a maximum 60 seconds or 10 MB (whichever is less).
  • Files are transcribed with the Chirp model.
  • Only 16-bit linear PCM WAV files are supported.

You can use the Speech-to-Text UI directly to overcome these limitations.

What's next

  • For more models, advanced features, and ability to transcribe files up to 8 hours, see Speech-to-Text.