使用客户端库将语音转写为文字

本页面介绍了如何使用Google Cloud 客户端库以您喜爱的编程语言向 Speech-to-Text 发送语音识别请求。

Speech-to-Text 能够将 Google 语音识别技术轻松集成到开发者应用中。您可以向 Speech-to-Text API 发送音频数据，然后该 API 会返回该音频文件的文字转录。如需详细了解该服务，请参阅 Speech-to-Text 基础知识。

准备工作

您必须先完成以下操作，然后才能向 Speech-to-Text API 发送请求。如需了解详情，请参阅准备工作页面。

在 Google Cloud 项目上启用 Speech-to-Text。
确保已针对 Speech-to-Text 启用结算功能。
安装 Google Cloud CLI。安装完成后，运行以下命令来初始化 Google Cloud CLI：
```
gcloud init
```
如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。
If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud auth application-default login
```
You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.
验证您是否拥有完成本指南所需的权限。如果您为此指南创建了新项目，则您已拥有所需的权限。
（可选）创建新的 Cloud Storage 存储桶以存储您的音频数据。

所需的角色

如需获得将语音转写为文本所需的权限，请让您的管理员为您授予项目的 Service Usage Consumer (roles/serviceusage.serviceUsageConsumer) IAM 角色。如需详细了解如何授予角色，请参阅管理对项目、文件夹和组织的访问权限。

您也可以通过自定义角色或其他预定义角色来获取所需的权限。

安装客户端库

Go

go get cloud.google.com/go/speech/apiv1

Java

If you are using Maven, add the following to your pom.xml file. For more information about BOMs, see The Google Cloud Platform Libraries BOM.

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.68.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-speech</artifactId>
  </dependency>
</dependencies>

If you are using Gradle, add the following to your dependencies:

implementation 'com.google.cloud:google-cloud-speech:4.69.0'

If you are using sbt, add the following to your dependencies:

libraryDependencies += "com.google.cloud" % "google-cloud-speech" % "4.69.0"

If you're using Visual Studio Code, IntelliJ, or Eclipse, you can add client libraries to your project using the following IDE plugins:

The plugins provide additional functionality, such as key management for service accounts. Refer to each plugin's documentation for details.

Node.js

在安装库之前，请确保已经为 Node.js 开发准备好环境。

npm install @google-cloud/speech

Python

在安装库之前，请确保已经为 Python 开发准备好环境。

pip install --upgrade google-cloud-speech

发出音频转录请求

现在您可以使用 Speech-to-Text 将音频文件转录为文字。请使用以下代码向 Speech-to-Text API 发送 recognize 请求。

Go


// Sample speech-quickstart uses the Google Cloud Speech API to transcribe
// audio.
package main

import (
	"context"
	"fmt"
	"log"

	speech "cloud.google.com/go/speech/apiv1"
	"cloud.google.com/go/speech/apiv1/speechpb"
)

func main() {
	ctx := context.Background()

	// Creates a client.
	client, err := speech.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}
	defer client.Close()

	// The path to the remote audio file to transcribe.
	fileURI := "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

	// Detects speech in the audio file.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Uri{Uri: fileURI},
		},
	})
	if err != nil {
		log.Fatalf("failed to recognize: %v", err)
	}

	// Prints the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
}

Java

// Imports the Google Cloud client library
import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import java.util.List;

public class QuickstartSample {

  /** Demonstrates using the Speech API to transcribe an audio file. */
  public static void main(String... args) throws Exception {
    // Instantiates a client
    try (SpeechClient speechClient = SpeechClient.create()) {

      // The path to the audio file to transcribe
      String gcsUri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw";

      // Builds the sync recognize request
      RecognitionConfig config =
          RecognitionConfig.newBuilder()
              .setEncoding(AudioEncoding.LINEAR16)
              .setSampleRateHertz(16000)
              .setLanguageCode("en-US")
              .build();
      RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

      // Performs speech recognition on the audio file
      RecognizeResponse response = speechClient.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());
      }
    }
  }
}

Node.js

在运行该示例之前，请确保已经为 Node.js 开发准备好环境。

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

async function quickstart() {
  // The path to the remote LINEAR16 file
  const gcsUri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw';

  // The audio file's encoding, sample rate in hertz, and BCP-47 language code
  const audio = {
    uri: gcsUri,
  };
  const config = {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
  };
  const request = {
    audio: audio,
    config: config,
  };

  // Detects speech in the audio file
  const [response] = await client.recognize(request);
  const transcription = response.results
    .map(result => result.alternatives[0].transcript)
    .join('\n');
  console.log(`Transcription: ${transcription}`);
}
quickstart();

Python

在运行该示例之前，请确保已经为 Python 开发准备好环境。


# Imports the Google Cloud client library


from google.cloud import speech



def run_quickstart() -> speech.RecognizeResponse:
    # Instantiates a client
    client = speech.SpeechClient()

    # The name of the audio file to transcribe
    gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

    audio = speech.RecognitionAudio(uri=gcs_uri)

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    # Detects speech in the audio file
    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

恭喜！您已向 Speech-to-Text 发送了您的第一个请求！

如果您收到来自 Speech-to-Text 的错误或空响应，请查看问题排查和纠错步骤。

清理

为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用，请按照以下步骤操作。

使用 Google Cloud console 删除您不需要的项目。

后续步骤

练习转录短音频文件。
了解如何批量处理长音频文件以进行语音识别。
了解如何转录流式音频，例如来自麦克风的音频。
通过使用 Speech-to-Text 客户端库，以您选择的语言开始使用 Speech-to-Text。
上手体验示例应用。
如需了解关于最佳性能、准确度和其他方面的提示，请参阅最佳做法文档。