傳送含有模型調整功能的辨識要求

您可以透過模型調整功能,提高 Speech-to-Text 轉錄結果的準確度。模型調整功能可讓您指定字詞和/或詞組,讓 Speech-to-Text 在音訊資料中辨識這些字詞和/或詞組的頻率,高於其他可能建議的替代方案。模型調整功能特別適合用於下列用途,可提高語音轉錄準確率:

  1. 音訊中經常出現某些字詞/詞組。
  2. 音訊可能含有罕見字詞 (例如專有名詞) 或一般用法中不存在的字詞。
  3. 音訊含有雜音或內容不太清楚。

如要進一步瞭解如何使用這項功能,請參閱「透過模型調整機制來改善語音轉錄結果」。如要瞭解每個模型調整要求適用的片語和字元限制,請參閱「配額與限制」。並非所有模型都支援語音調整功能。請參閱語言支援,瞭解哪些模型支援調整功能。

程式碼範例

語音調整是 Speech-to-Text 的選用設定,可根據需求自訂轉錄結果。如要進一步瞭解如何設定辨識要求主體,請參閱 RecognitionConfig 說明文件。

下列程式碼範例說明如何使用 SpeechAdaptation 資源提升轉錄準確度: PhraseSetCustomClass模型調整提升。如要在日後的要求中使用 PhraseSetCustomClass,請記下資源 name,建立資源時,系統會在回應中傳回該資源。

如需您所用語言的預先建構類別清單,請參閱支援的類別權杖

Python

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Python API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

import os

from google.cloud import speech_v1p1beta1 as speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_with_model_adaptation(
    audio_uri: str,
    custom_class_id: str,
    phrase_set_id: str,
) -> str:
    """Create `PhraseSet` and `CustomClasses` for custom item lists in input data.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio. e.g. gs://[BUCKET]/[FILE]
        custom_class_id (str): The unique ID of the custom class to create
        phrase_set_id (str): The unique ID of the PhraseSet to create.
    Returns:
        The transcript of the input audio.
    """
    # Specifies the location where the Speech API will be accessed.
    location = "global"

    # Audio object
    audio = speech.RecognitionAudio(uri=audio_uri)

    # Create the adaptation client
    adaptation_client = speech.AdaptationClient()

    # The parent resource where the custom class and phrase set will be created.
    parent = f"projects/{PROJECT_ID}/locations/{location}"

    # Create the custom class resource
    adaptation_client.create_custom_class(
        {
            "parent": parent,
            "custom_class_id": custom_class_id,
            "custom_class": {
                "items": [
                    {"value": "sushido"},
                    {"value": "altura"},
                    {"value": "taneda"},
                ]
            },
        }
    )
    custom_class_name = (
        f"projects/{PROJECT_ID}/locations/{location}/customClasses/{custom_class_id}"
    )
    # Create the phrase set resource
    phrase_set_response = adaptation_client.create_phrase_set(
        {
            "parent": parent,
            "phrase_set_id": phrase_set_id,
            "phrase_set": {
                "boost": 10,
                "phrases": [
                    {"value": f"Visit restaurants like ${{{custom_class_name}}}"}
                ],
            },
        }
    )
    phrase_set_name = phrase_set_response.name
    # The next section shows how to use the newly created custom
    # class and phrase set to send a transcription request with speech adaptation

    # Speech adaptation configuration
    speech_adaptation = speech.SpeechAdaptation(phrase_set_references=[phrase_set_name])

    # speech configuration object
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=24000,
        language_code="en-US",
        adaptation=speech_adaptation,
    )

    # Create the speech client
    speech_client = speech.SpeechClient()

    response = speech_client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")