Try Speech-to-Text

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Speech-to-Text pre-trained API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Follow these steps before trying Speech-to-Text:

Set up a project using the GDC console to group the Vertex AI services. For information about creating and using projects, see Create a project.
Ask your Project IAM Admin to grant you the AI Speech Developer (ai-speech-developer) role in your project namespace.
Enable the Speech-to-Text pre-trained API.
Download the gdcloud command-line interface (CLI).

Set up your service account

Set up your service account with the name of your service account, project ID, and service key. Replace the PROJECT_ID with your project.

  ${HOME}/gdcloud init  # set URI and project

  ${HOME}/gdcloud auth login

  ${HOME}/gdcloud iam service-accounts create SERVICE_ACCOUNT  --project=PROJECT_ID

  ${HOME}/gdcloud iam service-accounts keys create "SERVICE_KEY".json --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT

Grant access to project resources

Grant access to the Translation API service account by providing your project ID, name of your service account, and the role ai-speech-developer.

  ${HOME}/gdcloud iam service-accounts add-iam-policy-binding --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT --role=role/ai-speech-developer

Set your environment variables

Before running the Speech-to-Text pre-trained service, set your environment variable.

  export GOOGLE_APPLICATION_CREDENTIALS="SERVICE_KEY".json

Authenticate the request

You must get a token to authenticate the requests to the Speech-to-Text pre-trained service. Follow these steps:

gdcloud CLI

Export the identity token for the specified account to an environment variable:

export TOKEN="$($HOME/gdcloud auth print-identity-token --audiences=https://ENDPOINT)"

Replace ENDPOINT with the Speech-to-Text endpoint. For more information, view service statuses and endpoints.

Python

Install the google-auth client library.
```
pip install google-auth
```

Save the following code to a Python script, and update the ENDPOINT to the Speech-to-Text endpoint. For more information, see View service statuses and endpoints.

import google.auth
from google.auth.transport import requests

api_endpoint = "https://ENDPOINT"

creds, project_id = google.auth.default()
creds = creds.with_gdch_audience(api_endpoint)

def test_get_token():
  req = requests.Request()
  creds.refresh(req)
  print(creds.token)

if __name__=="__main__":
  test_get_token()

Run the script to fetch the token.

Run the Speech-to-Text pre-trained API sample script

This example shows you how to interact with a Speech-to-Text pre-trained API.

Check whether there is a client library installed.
```
  pip freeze | grep speech
  # output example: google-cloud-speech==2.15.0
```
If the existing version doesn't match the client library in https://CONSOLE_ENDPOINT/.well-known/static/client-libraries, uninstall the client library using the following command:
```
  pip uninstall google-cloud-speech
```
Specify the console endpoint and the client library for Speech-to-Text (provided in the example).
```
   wget https://CONSOLE_ENDPOINT/.well-known/static/client-libraries/google-cloud-speech
```
Note: If the error message, "x509: certificate signed by unknown authority", is displayed, your workstation doesn't trust the CA certificate used in Distributed Cloud. Follow your organization's procedure to check the trusted certification store for your workstation.
Warning: Using --login-config-cert with an unverified certificate makes your workstation vulnerable to man-in-the-middle attacks. Ensure that you rely only on your workstation's trust store instead of trusting a CA certificate from unknown sources.
Extract the tar file, and install it using pip. If errors are generated because something isn't found, install any missing dependencies.
```
tar -xvzf CLIENT_LIBRARY

pip install -r FOLDER/requirements.txt --no-index --find-links FOLDER
```
Use the Speech-to-Text client library script to generate the token, and make requests to the OCR service.

Set up your environment variable.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""SERVICE_KEY".json"

Speech-to-Text sample

Replace the ENDPOINT with the Speech-to-Text endpoint that you use for your organization.

import base64

from google.cloud import speech_v1p1beta1
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience = "https://ENDPOINT:443"
api_endpoint="ENDPOINT:443"

def get_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  return speech_v1p1beta1.SpeechClient(credentials=creds, client_options=opts)
def main():
  creds = None
  try:
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    creds.refresh(req)
    print("Got token: ")
    print(creds.token)
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def speech_func(creds):
  tc = get_client(creds)
  content="CONTENT"

  audio = speech_v1p1beta1.RecognitionAudio()
  audio.content = base64.standard_b64decode(content)
  config = speech_v1p1beta1.RecognitionConfig()
  config.encoding= speech_v1p1beta1.RecognitionConfig.AudioEncoding.LINEAR16
  config.sample_rate_hertz=16000
  config.language_code="en-US"
  config.audio_channel_count=1

  metadata = [("x-goog-user-project", "projects/PROJECT_ID")]

  resp = tc.recognize(config=config, audio=audio)
  print(resp)

if __name__=="__main__":
  creds = main()
  speech_func(creds)

What's next

Learn more about how to Transcribe audio.