Chirp：通用语音模型

Chirp 是 Google 的新一代语音转文字模型。代表着多年的研究发展，Chirp 的第一个版本现在可用于 Speech-to-Text。我们计划改进 Chrp 并将其扩展到更多语言和领域。如需了解详情，请参阅我们的论文 Google USM。

我们使用与当前语音模型不同的架构训练 Chirp 模型。单个模型统一了多种语言的数据。但是，用户仍需指定模型应该用以识别语音的语言。Chirp 不支持其他模型所具备的某些 Google Speech 功能。如需查看完整列表，请参阅功能支持和限制。

模型标识符

Chirp 可在 Speech-to-Text API v2 中使用。您可以像使用任何其他模型一样加以利用。

Chirp 的模型标识符为：chirp。

您可以在同步或批量识别请求中指定此模型。

可用的 API 方法

与其他模型相比，Chirp 将语音分成更大的块进行处理。这意味着它可能不适合真正的实时使用。Chirp 可通过以下 API 方法获得：

v2 Speech.Recognize（非常适合短于 1 分钟的短音频）
v2 Speech.BatchRecognize（适合 1 分钟到 8 小时的长音频）

以下 API 方法不支持 Chirp：

v2 Speech.StreamingRecognize
v1 Speech.StreamingRecognize
v1 Speech.Recognize
v1 Speech.LongRunningRecognize
v1p1beta1 Speech.StreamingRecognize
v1p1beta1 Speech.Recognize
v1p1beta1 Speech.LongRunningRecognize

区域

Chirp 适用于以下区域：

us-central1
europe-west4
asia-southeast1

如需了解详情，请参阅语言页面。

语言

您可以在完整语言列表中查看支持的语言。

功能支持和限制

Chirp 不支持某些 STT API 功能：

置信度分数：API 会返回一个值，但这不是真正的置信度分数。
语音自适应 - 不支持自适应功能。
区分：不支持自动区分。
强制归一化 - 不支持。
字词级置信度 - 不支持。
语言检测 - 不支持。

Chirp 支持以下功能：

自动加注标点符号：标点符号由模型预测。可以将其停用。
字词计时：酌情返回。
与语言无关的音频转写：模型自动推断出音频文件中的口语，并将其添加到结果中。

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
前往 IAM
选择项目。
点击 授予访问权限。
在新的主账号字段中，输入您的用户标识符。这通常是 Google 账号的电子邮件地址。
在选择角色列表中，选择一个角色。
如需授予其他角色，请点击 添加其他角色，然后添加其他各个角色。
点击 Save（保存）。

Install the Google Cloud CLI.

注意：如果您之前安装了 gcloud CLI，请确保通过运行 gcloud components update 来获得最新版本。

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
前往 IAM
选择项目。
点击 授予访问权限。
在新的主账号字段中，输入您的用户标识符。这通常是 Google 账号的电子邮件地址。
在选择角色列表中，选择一个角色。
如需授予其他角色，请点击 添加其他角色，然后添加其他各个角色。
点击 Save（保存）。

Install the Google Cloud CLI.

注意：如果您之前安装了 gcloud CLI，请确保通过运行 gcloud components update 来获得最新版本。

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

客户端库可以使用应用默认凭据轻松进行 Google API 身份验证，并向这些 API 发送请求。借助应用默认凭证，您可以在本地测试应用并部署它，无需更改底层代码。如需了解详情，请参阅使用客户端库时进行身份验证。

If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud auth application-default login
```
You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

此外，请确保您已安装客户端库。

使用 Chirp 执行同步语音识别

下面的示例展示了如何使用 Chirp 对本地音频文件执行同步语音识别：

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_chirp(
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribes an audio file using the Chirp model of Google Cloud Speech-to-Text API.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.

    """
    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

在启用与语言无关的转录的情况下发出请求

以下代码示例演示了如何在启用与语言无关的转录的情况下发出请求。

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_chirp_auto_detect_language(
    audio_file: str,
    region: str = "us-central1",
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file and auto-detect spoken language using Chirp.
    Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
    information on which audio encodings are supported.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        region (str): The region for the API endpoint.
    Returns:
        cloud_speech.RecognizeResponse: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{region}-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["auto"],  # Set language code to auto to detect language.
        model="chirp",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{region}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
        print(f"Detected Language: {result.language_code}")

    return response

在 Google Cloud 控制台中开始使用 Chirp

确保您已注册 Google Cloud 账号并创建项目。
前往 Google Cloud 控制台中的语音。
启用 API（如果尚未启用）。
前往转写子页面。
点击新建转写
确保您拥有 STT 工作区。如果没有，请创建一个。
1. 打开工作区下拉列表，然后点击新建工作区。
2. 在创建新工作区导航边栏中，点击浏览。
3. 点击以创建存储桶。
4. 输入存储桶的名称，然后点击继续。
5. 点击创建。
6. 创建存储桶后，点击选择以选择您的存储桶。
7. 点击创建以完成为 Speech-to-Text 创建工作区的过程。
对音频执行转写。
1. 在新建转写页面中，选择一个用于选择音频文件的选项：
  - 点击从本地上传文件进行上传。
  - 点击 Cloud Storage 以指定现有的 Cloud Storage 文件。
注意：Speech-to-Text 会尝试自动评估您的音频文件参数。
1. 点击继续。
1. 在转写选项部分中，从您之前创建的识别器中选择您计划用于使用 Chirp 进行识别的口语。
2. 在模型*下拉菜单中，选择 Chirp。
3. 在区域下拉列表中选择一个区域，例如 us-central1。
4. 点击继续。
5. 如需使用 Chirp 发起首次识别请求，请在主界面点击提交。
查看 Chirp 转写结果。
1. 在转写页面中，点击转写的名称。
2. 在转写详情页面中，查看转写结果，并酌情在浏览器中播放音频。

清理

为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用，请按照以下步骤操作。

Optional: Revoke the authentication credentials that you created, and delete the local credential file.
```
gcloud auth application-default revoke
```
Optional: Revoke credentials from the gcloud CLI.
```
gcloud auth revoke
```

控制台

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

后续步骤

练习转录短音频文件。
了解如何转写流式传输音频。
了解如何转录长音频文件。
如需了解关于最佳性能、准确度和其他方面的提示，请参阅最佳实践文档。

Chirp：通用语音模型 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

模型标识符

可用的 API 方法

区域

语言

功能支持和限制

准备工作

Check for the roles

Grant the roles

Check for the roles

Grant the roles

使用 Chirp 执行同步语音识别

Python

在启用与语言无关的转录的情况下发出请求

Python

在 Google Cloud 控制台中开始使用 Chirp

清理

控制台

gcloud

后续步骤

Chirp：通用语音模型