本頁面由 Cloud Translation API 翻譯而成。

將長音訊檔案轉錄成文字

這個頁面說明如何使用 Speech-to-Text API 和非同步語音辨識，將長音訊檔案 (長度超過一分鐘) 轉錄為文字。

非同步語音辨識簡介

「批次語音辨識」會啟動長時間執行的音訊處理作業。使用非同步語音辨識功能，轉錄長度超過 60 秒的音訊。如果是較短的音訊，使用同步語音辨識會更快、更簡單。非同步語音辨識的上限為 480 分鐘 (8 小時)。

批次語音辨識只能轉錄儲存在 Cloud Storage 中的音訊。轉錄稿輸出內容可內嵌在回應中 (適用於單一檔案的批次辨識要求)，或寫入 Cloud Storage。

批次辨識要求會傳回 Operation，其中包含要求目前辨識處理作業的相關資訊。您可以輪詢作業，瞭解作業何時完成，以及何時可取得轉錄稿。

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
前往「身分與存取權管理」頁面
選取專案。
按一下「授予存取權」。
在「New principals」(新增主體) 欄位中，輸入您的使用者 ID。這通常是 Google 帳戶的電子郵件地址。
在「Select a role」(選取角色) 清單中，選取角色。
如要授予其他角色，請按一下「新增其他角色」，然後新增每個其他角色。
按一下 [Save]。

Install the Google Cloud CLI.

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確認您使用的是最新版本。

如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

如要初始化 gcloud CLI，請執行下列指令：

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
前往「身分與存取權管理」頁面
選取專案。
按一下「授予存取權」。
在「New principals」(新增主體) 欄位中，輸入您的使用者 ID。這通常是 Google 帳戶的電子郵件地址。
在「Select a role」(選取角色) 清單中，選取角色。
如要授予其他角色，請按一下「新增其他角色」，然後新增每個其他角色。
按一下 [Save]。

Install the Google Cloud CLI.

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確認您使用的是最新版本。

如果您使用外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

如要初始化 gcloud CLI，請執行下列指令：

gcloud init

用戶端程式庫可以使用應用程式預設憑證，輕鬆向 Google API 進行驗證，然後傳送要求給這些 API。使用應用程式預設憑證，您可以在本機測試及部署應用程式，不必變更基礎程式碼。詳情請參閱「驗證以使用用戶端程式庫」。

If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud auth application-default login
```
You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

此外，請確認您已安裝用戶端程式庫。

啟用 Cloud Storage 存取權

Speech-to-Text 會使用服務帳戶存取 Cloud Storage 中的檔案。根據預設，服務帳戶可存取同一個專案中的 Cloud Storage 檔案。

服務帳戶電子郵件地址如下：

service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com

如要轉錄其他專案中的 Cloud Storage 檔案，您可以授予該服務帳戶在其他專案中的 Speech-to-Text 服務代理程式角色：

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/speech.serviceAgent

如要進一步瞭解專案 IAM 政策，請參閱「管理專案、資料夾和機構的存取權」一文。

您也可以授予服務帳戶特定 Cloud Storage 值區的權限，提供更精細的存取權：

gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/storage.admin

如要進一步瞭解如何管理 Cloud Storage 存取權，請參閱 Cloud Storage 說明文件中的「建立及管理存取權控管清單」。

執行批次辨識並顯示內嵌結果

以下範例示範如何對 Cloud Storage 中的音訊檔案執行批次語音辨識，並從回應中內嵌讀取轉錄結果：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_gcs_input_inline_output_v2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
        The transcription results are returned inline in the response.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

執行批次辨識，並將結果寫入 Cloud Storage

以下範例示範如何對 Cloud Storage 中的音訊檔案執行批次語音辨識，並從 Cloud Storage 中的輸出檔案讀取轉錄結果。請注意，寫入 Cloud Storage 的檔案是 JSON 格式的 BatchRecognizeResults 訊息：

Python

import os

import re

from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_gcs_input_gcs_output_v2(
    audio_uri: str,
    gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
    The transcription results are stored in another Google Cloud Storage bucket.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://[BUCKET]/[FILE]
        gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript will be stored.
            E.g., gs://[BUCKET]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the URI of the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            gcs_output_config=cloud_speech.GcsOutputConfig(
                uri=gcs_output_path,
            ),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    file_results = response.results[audio_uri]

    print(f"Operation finished. Fetching results from {file_results.uri}...")
    output_bucket, output_object = re.match(
        r"gs://([^/]+)/(.*)", file_results.uri
    ).group(1, 2)

    # Instantiates a Cloud Storage client
    storage_client = storage.Client()

    # Fetch results from Cloud Storage
    bucket = storage_client.bucket(output_bucket)
    blob = bucket.blob(output_object)
    results_bytes = blob.download_as_bytes()
    batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
        results_bytes, ignore_unknown_fields=True
    )

    for result in batch_recognize_results.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return batch_recognize_results

對多個檔案執行批次辨識

以下範例示範如何對 Cloud Storage 中的多個音訊檔案執行批次語音辨識，並從 Cloud Storage 中的輸出檔案讀取轉錄結果：

Python

import os
import re
from typing import List

from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_multiple_files_v2(
    audio_uris: List[str],
    gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResponse:
    """Transcribes audio from multiple Google Cloud Storage URIs using the Google Cloud Speech-to-Text API.
    The transcription results are stored in another Google Cloud Storage bucket.
    Args:
        audio_uris (List[str]): The list of Google Cloud Storage URIs of the input audio files.
            E.g., ["gs://[BUCKET]/[FILE]", "gs://[BUCKET]/[FILE]"]
        gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript will be stored.
            E.g., gs://[BUCKET]
    Returns:
        cloud_speech.BatchRecognizeResponse: The response containing the URIs of the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    files = [cloud_speech.BatchRecognizeFileMetadata(uri=uri) for uri in audio_uris]

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=files,
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            gcs_output_config=cloud_speech.GcsOutputConfig(
                uri=gcs_output_path,
            ),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    print("Operation finished. Fetching results from:")
    for uri in audio_uris:
        file_results = response.results[uri]
        print(f"  {file_results.uri}...")
        output_bucket, output_object = re.match(
            r"gs://([^/]+)/(.*)", file_results.uri
        ).group(1, 2)

        # Instantiates a Cloud Storage client
        storage_client = storage.Client()

        # Fetch results from Cloud Storage
        bucket = storage_client.bucket(output_bucket)
        blob = bucket.blob(output_object)
        results_bytes = blob.download_as_bytes()
        batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
            results_bytes, ignore_unknown_fields=True
        )

        for result in batch_recognize_results.results:
            print(f"     Transcript: {result.alternatives[0].transcript}")

    return response

在批次辨識中啟用動態批次處理

動態批次處理可降低轉錄費用，但延遲時間較長。這項功能僅適用於批次辨識。

以下範例示範如何對 Cloud Storage 中的音訊檔案執行批次辨識，並啟用動態批次處理：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_dynamic_batching_v2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using dynamic batching.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio.
        E.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
        processing_strategy=cloud_speech.BatchRecognizeRequest.ProcessingStrategy.DYNAMIC_BATCHING,
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

覆寫每個檔案的辨識功能

根據預設，批次辨識會為批次辨識要求中的每個檔案使用相同的辨識設定。如果不同檔案需要不同的設定或功能，可以使用 [BatchRecognizeFileMetadata][batch-file-metadata-grpc] 訊息中的 config 欄位，為每個檔案覆寫設定。如需覆寫辨識功能的範例，請參閱辨識器說明文件。

清除所用資源

如要避免系統向您的 Google Cloud 帳戶收取本頁所用資源的費用，請按照下列步驟操作。

Optional: Revoke the authentication credentials that you created, and delete the local credential file.
```
gcloud auth application-default revoke
```
Optional: Revoke credentials from the gcloud CLI.
```
gcloud auth revoke
```

控制台

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

後續步驟

如要瞭解批次辨識功能，請參閱參考說明文件。
瞭解如何轉錄串流音訊。
練習轉錄短音訊檔案。
使用 Chirp 轉錄音訊檔案。
如要獲得最佳效能、準確率與其他提示，請參閱最佳做法說明文件。

將長音訊檔案轉錄成文字 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

非同步語音辨識簡介

事前準備

Check for the roles

Grant the roles

Check for the roles

Grant the roles

啟用 Cloud Storage 存取權

執行批次辨識並顯示內嵌結果

Python

執行批次辨識，並將結果寫入 Cloud Storage

Python

對多個檔案執行批次辨識

Python

在批次辨識中啟用動態批次處理

Python

覆寫每個檔案的辨識功能

清除所用資源

控制台

gcloud

後續步驟

將長音訊檔案轉錄成文字