啟用字詞層級信心值

您可以指定讓語音轉文字針對轉錄中的個別字詞,標明準確率或信賴度的值。

字詞層級信賴度

「語音轉文字」對音訊剪輯執行語音轉錄時,也會測量回應的準確度。從語音轉文字功能傳送的回應會以 0.0 至 1.0 的數字,表明整個語音轉錄要求的信賴度。下列程式碼範例示範了語音轉文字傳回的信賴度。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.96748614
        }
      ]
    }
  ]
}

除了整個語音轉錄的信賴度外,語音轉文字亦可提供語音轉錄中個別字詞的信賴度資訊。回覆隨後會在語音轉錄中加入 WordInfo 詳細資料,指出個別字詞的信賴度,如下列範例所示。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startTime": "0s",
              "endTime": "0.300s",
              "word": "how",
              "confidence": SOME NUMBER
            },
            ...
          ]
        }
      ]
    }
  ]
}

在要求中啟用字詞層級信心值

下列程式碼片段示範如何使用本機和遠端檔案,在語音轉文字的語音轉錄要求中啟用字詞層級的信賴度

使用本機檔案

通訊協定

如要瞭解完整的詳細資訊,請參閱 speech:recognize API 端點。

如要執行同步語音辨識,請提出 POST 要求並提供適當的要求內容。以下為使用 curlPOST 要求示例。這個範例使用 Google Cloud CLI 產生存取權杖。如需安裝 gcloud CLI 的操作說明,請參閱快速入門導覽課程

以下範例說明如何使用 curl 傳送 POST 要求,其中要求主體會啟用字詞層級的信賴度。

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    --data '{
    "config": {
        "encoding": "FLAC",
        "sampleRateHertz": 16000,
        "languageCode": "en-US",
        "enableWordTimeOffsets": true,
        "enableWordConfidence": true
    },
    "audio": {
        "uri": "gs://cloud-samples-tests/speech/brooklyn.flac"
    }
}' > word-level-confidence.txt

如果要求成功,伺服器會傳回 200 OK HTTP 狀態碼與 JSON 格式的回應,並另存成名為 word-level-confidence.txt 的檔案。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startTime": "0s",
              "endTime": "0.300s",
              "word": "how",
              "confidence": 0.98762906
            },
            {
              "startTime": "0.300s",
              "endTime": "0.600s",
              "word": "old",
              "confidence": 0.96929157
            },
            {
              "startTime": "0.600s",
              "endTime": "0.800s",
              "word": "is",
              "confidence": 0.98271006
            },
            {
              "startTime": "0.800s",
              "endTime": "0.900s",
              "word": "the",
              "confidence": 0.98271006
            },
            {
              "startTime": "0.900s",
              "endTime": "1.100s",
              "word": "Brooklyn",
              "confidence": 0.98762906
            },
            {
              "startTime": "1.100s",
              "endTime": "1.500s",
              "word": "Bridge",
              "confidence": 0.98762906
            }
          ]
        }
      ],
      "languageCode": "en-us"
    }
  ]
}

Java

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Java API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

/**
 * Transcribe a local audio file with word level confidence
 *
 * @param fileName the path to the local audio file
 */
public static void transcribeWordLevelConfidence(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speechClient = SpeechClient.create()) {
    RecognitionAudio recognitionAudio =
        RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
    // Configure request to enable word level confidence
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setSampleRateHertz(16000)
            .setLanguageCode("en-US")
            .setEnableWordConfidence(true)
            .build();
    // Perform the transcription request
    RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

    // Print out the results
    for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternatives(0);
      System.out.format("Transcript : %s\n", alternative.getTranscript());
      System.out.format(
          "First Word and Confidence : %s %s \n",
          alternative.getWords(0).getWord(), alternative.getWords(0).getConfidence());
    }
  }
}

Node.js

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Node.js API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

const fs = require('fs');

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';

const config = {
  encoding: 'FLAC',
  sampleRateHertz: 16000,
  languageCode: 'en-US',
  enableWordConfidence: true,
};

const audio = {
  content: fs.readFileSync(fileName).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
const confidence = response.results
  .map(result => result.alternatives[0].confidence)
  .join('\n');
console.log(`Transcription: ${transcription} \n Confidence: ${confidence}`);

console.log('Word-Level-Confidence:');
const words = response.results.map(result => result.alternatives[0]);
words[0].words.forEach(a => {
  console.log(` word: ${a.word}, confidence: ${a.confidence}`);
});

Python

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Python API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

from google.cloud import speech_v1p1beta1 as speech

client = speech.SpeechClient()

speech_file = "resources/Google_Gnome.wav"

with open(speech_file, "rb") as audio_file:
    content = audio_file.read()

audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
    enable_word_confidence=True,
)

response = client.recognize(config=config, audio=audio)

for i, result in enumerate(response.results):
    alternative = result.alternatives[0]
    print("-" * 20)
    print(f"First alternative of result {i}")
    print(f"Transcript: {alternative.transcript}")
    print(
        "First Word and Confidence: ({}, {})".format(
            alternative.words[0].word, alternative.words[0].confidence
        )
    )

return response.results

使用遠端檔案

Java

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Java API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

/**
 * Transcribe a remote audio file with word level confidence
 *
 * @param gcsUri path to the remote audio file
 */
public static void transcribeWordLevelConfidenceGcs(String gcsUri) throws Exception {
  try (SpeechClient speechClient = SpeechClient.create()) {

    // Configure request to enable word level confidence
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.FLAC)
            .setSampleRateHertz(44100)
            .setLanguageCode("en-US")
            .setEnableWordConfidence(true)
            .build();

    // Set the remote path for the audio file
    RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

    // Use non-blocking call for getting file transcription
    OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
        speechClient.longRunningRecognizeAsync(config, audio);

    while (!response.isDone()) {
      System.out.println("Waiting for response...");
      Thread.sleep(10000);
    }
    // Just print the first result here.
    SpeechRecognitionResult result = response.get().getResultsList().get(0);

    // There can be several alternative transcripts for a given chunk of speech. Just use the
    // first (most likely) one here.
    SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
    // Print out the result
    System.out.printf("Transcript : %s\n", alternative.getTranscript());
    System.out.format(
        "First Word and Confidence : %s %s \n",
        alternative.getWords(0).getWord(), alternative.getWords(0).getConfidence());
  }
}

Node.js

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Node.js API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const uri = path to GCS audio file e.g. `gs:/bucket/audio.wav`;

const config = {
  encoding: 'FLAC',
  sampleRateHertz: 16000,
  languageCode: 'en-US',
  enableWordConfidence: true,
};

const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
const confidence = response.results
  .map(result => result.alternatives[0].confidence)
  .join('\n');
console.log(`Transcription: ${transcription} \n Confidence: ${confidence}`);

console.log('Word-Level-Confidence:');
const words = response.results.map(result => result.alternatives[0]);
words[0].words.forEach(a => {
  console.log(` word: ${a.word}, confidence: ${a.confidence}`);
});

Python

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Python API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


from google.cloud import speech_v1p1beta1 as speech


def transcribe_file_with_word_level_confidence(audio_uri: str) -> str:
    """Transcribe a remote audio file with word level confidence.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio.
            E.g., gs://[BUCKET]/[FILE]
    Returns:
        The generated transcript from the audio file provided with word level confidence.
    """

    client = speech.SpeechClient()

    # Configure request to enable word level confidence
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=44100,
        language_code="en-US",
        enable_word_confidence=True,  # Enable word level confidence
    )

    # Set the remote path for the audio file
    audio = speech.RecognitionAudio(uri=audio_uri)

    # Use non-blocking call for getting file transcription
    response = client.long_running_recognize(config=config, audio=audio).result(
        timeout=300
    )

    transcript_builder = []
    for i, result in enumerate(response.results):
        alternative = result.alternatives[0]
        transcript_builder.append("-" * 20)
        transcript_builder.append(f"\nFirst alternative of result {i}")
        transcript_builder.append(f"\nTranscript: {alternative.transcript}")
        transcript_builder.append(
            "\nFirst Word and Confidence: ({}, {})".format(
                alternative.words[0].word, alternative.words[0].confidence
            )
        )

    transcript = "".join(transcript_builder)
    print(transcript)

    return transcript