Chirp is the next generation of Speech-to-Text models on Google Distributed Cloud (GDC) air-gapped. Representing a version of a Universal Speech Model, Chirp has over 2B parameters and can transcribe many languages in a single model.
You can transcribe audio in other supported languages that Speech-to-Text doesn't originally support by enabling the Chirp component.
Chirp achieves state-of-the-art Word Error Rate (WER) on a variety of public test sets and languages, offering multi-language support on Distributed Cloud. It uses a universal encoder that trains models with a different architecture than current speech models, using data in many different languages. The model is then fine-tuned to offer transcription for specific languages. A single model unifies data from multiple languages. However, users still specify the language in which the model should recognize speech.
Chirp processes speech in much larger chunks than other models do. Results are only available after an entire utterance has finished. This means it might not be suitable for true, real-time use.
Chirp is available in the Speech-to-Text pre-trained API. The model identifier
for Chirp is: chirp
. Therefore, in the
Distributed Cloud implementation of Speech-to-Text, you can set the value
chirp
on the model
field of the RecognitionConfig
message in your request.
Available API methods
Chirp supports both
Speech.Recognize
and Speech.StreamingRecognize
API methods.
The difference between both methods is that StreamingRecognize
only returns
results after each utterance. For this reason, this method has a latency on the
order of seconds rather than milliseconds after starting speech, compared to the
Recognize
method. However, StreamingRecognize
has a very low latency after
an utterance is finished, for example, in a sentence followed by a pause.
Before you begin
Before using Chirp on Distributed Cloud, follow these steps:
Ask your Project IAM Admin to grant you the AI Speech Developer (
ai-speech-developer
) role in your project namespace.Enable the pre-trained APIs before using the client library.
How to use Chirp
Work through the following steps to use Chirp as a model on the Speech-to-Text client library. You can use Python:
Python
- Open a notebook as your coding environment. If you don't have an existing notebook, create a notebook.
- Write your code using Python to install the Speech-to-Text library from a tar file and get a transcription.
- Import the Speech-to-Text client library and transcribe an audio file to generate a Speech-to-Text transcription.
# Import the Speech-to-Text client library.
from google.cloud import speech
# Instantiate a client.
client = speech.SpeechClient()
# Specify the audio file to transcribe.
audio_uri = "YOUR_AUDIO_TO_TRANSCRIBE"
audio = speech.RecognitionAudio(uri=audio_uri)
metadata = [("x-goog-user-project", "projects/PROJECT_ID")]
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
audio_channel_count=1,
language_code="LANGUAGE_CODE",
model="chirp"
)
# Detect speech in the audio file.
response = client.recognize(config=config, audio=audio)
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
Replace LANGUAGE_CODE with a supported language code.
Sample of the Speech-to-Text client library
To transcribe an audio file using the Chirp model on the Speech-to-Text API, first view the statuses and endpoints of the pre-trained models to identify your endpoint. Then, follow the sample code:
# api_endpoint = '0.0.0.0:10000'
# local_file_path = '../resources/two_channel_16k.raw'
from google.cloud import speech_v1p1beta1
import grpc
import io
def transcribe(local_file_path, api_endpoint):
transport = speech_v1p1beta1.services.speech.transports.SpeechGrpcTransport(channel= grpc.insecure_channel(target=api_endpoint))
client = speech_v1p1beta1.SpeechClient(transport=transport)
config = {
"encoding": speech_v1p1beta1.RecognitionConfig.AudioEncoding.LINEAR16,
"language_code": "LANGUAGE_CODE",
"sample_rate_hertz": 16000,
"audio_channel_count": 1
"model": "chirp"
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(request={"config": config, "audio": audio})
metadata = [("x-goog-user-project", "projects/PROJECT_ID")]
Replace LANGUAGE_CODE with a supported language code.
Supported languages
The following languages are supported by Chirp:
Language | Language code |
---|---|
English (United States) | en-US |
Indonesian (Indonesia) | id-ID |
Malay (Malaysia) | ms-MY |