Speech-to-Text enables easy integration of Google speech recognition technologies into your solution. Speech-to-Text is a machine learning (ML) technology that gives you full control over your infrastructure and your protected speech data to meet data residency and compliance requirements.
The following table describes the key capabilities of Speech-to-Text.
Key capabilities | |
---|---|
Transcription | Applies advanced deep learning neural network algorithms from Google to automatic speech recognition. |
Models | Deploys models that are less than 1 GB in size and consume minimal resources. |
API compatible | Uses an API that is fully compatible with the Speech-to-Text API and its client libraries. |
For a list of supported audio encoding formats, see AudioEncoding of Speech-to-Text.
Before you begin
To get the permissions you need to use the Vertex AI Speech-to-Text pre-trained API, ask your Project IAM Admin to grant you the AI Speech Developer (ai-speech-developer
) role in your project namespace.
How to use the Speech-to-Text client library
Work through the following steps to use the Speech-to-Text client library using Python:
Python
- Open a notebook as your coding environment. If you don't have an existing notebook, create a notebook.
- Write your code using Python to install the Speech-to-Text library from a tar file and get a transcription. The following code sample shows how to import the Speech-to-Text client library and transcribe an audio file.
- Run your code to generate a Speech-to-Text transcription.
# Import the Speech-to-Text client library.
from google.cloud import speech
# Instantiate a client.
client = speech.SpeechClient()
# Specify the audio file to transcribe.
audio_uri = "YOUR_AUDIO_TO_TRANSCRIBE"
audio = speech.RecognitionAudio(uri=audio_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
audio_channel_count=1,
language_code="LANGUAGE_CODE",
)
# Detect speech in the audio file.
response = client.recognize(config=config, audio=audio)
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
Replace LANGUAGE_CODE with a supported language code.
Sample of the Speech-to-Text client library
To transcribe an audio file using the Speech-to-Text API, first view the statuses and endpoints of the pre-trained models to identify your endpoint. Then, follow the sample code:
# api_endpoint = '0.0.0.0:10000'
# local_file_path = '../resources/two_channel_16k.raw'
from google.cloud import speech_v1p1beta1
import grpc
import io
def transcribe(local_file_path, api_endpoint):
transport = speech_v1p1beta1.services.speech.transports.SpeechGrpcTransport(channel= grpc.insecure_channel(target=api_endpoint))
client = speech_v1p1beta1.SpeechClient(transport=transport)
config = {
"encoding": speech_v1p1beta1.RecognitionConfig.AudioEncoding.LINEAR16,
"language_code": "LANGUAGE_CODE",
"sample_rate_hertz": 16000,
"audio_channel_count": 1
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(request={"config": config, "audio": audio})
Replace LANGUAGE_CODE with a supported language code.
Supported languages
The following languages are supported by Speech-to-Text:
Language | Language code |
---|---|
Arabic (Egypt) | ar-EG |
Arabic (Levantine) | ar-x-levant |
Arabic (Maghrebi) | ar-x-maghrebi |
Arabic (Peninsular Gulf) | ar-x-gulf |
Chinese, Mandarin (Simplified, China) | cmn-hans-cn |
English (United States) | en-US |
French (France) | fr-FR |
German (Germany) | de-DE |
Korean (South Korea) | ko-KR |
Portuguese (Brazil) | pt-BR |
Russian (Russia) | ru-RU |
Spanish (United States) | es-US |
Ukrainian (Ukraine) | uk-UA |
Urdu (Pakistan) | ur-PK |
Persian (Iran) | fa-IR |
Swahili | sw |