本頁面由 Cloud Translation API 翻譯而成。

轉錄短音訊檔案

這個頁面說明如何使用同步語音辨識功能，將短音訊檔案轉錄為文字內容。

同步語音辨識會針對短音訊 (少於 60 秒) 傳回辨識出的文字。如要處理長度超過 60 秒的音訊語音辨識要求，請使用非同步語音辨識。

您可以從本機檔案將音訊內容直接傳送至 Speech-to-Text，也可以讓 Speech-to-Text 處理儲存在 Google Cloud Storage 值區中的音訊內容。如要瞭解同步語音辨識要求的限制，請參閱配額與限制頁面。

對本機檔案執行同步語音辨識

以下是對本機音訊檔案執行同步語音辨識的範例：

REST

如要瞭解完整的詳細資訊，請參閱 speech:recognize API 端點。如要進一步瞭解如何設定要求內容，請參閱 RecognitionConfig 參考說明文件。

要求主體中提供的音訊內容必須採用 Base64 編碼。如要進一步瞭解如何使用 Base64 編碼音訊，請參閱「Base64 編碼音訊內容」。如要進一步瞭解 content 欄位，請參閱 RecognitionAudio。

使用任何要求資料之前，請先替換以下項目：

LANGUAGE_CODE：音訊片段中使用的語言 BCP-47 代碼。
ENCODING：要轉錄的音訊編碼。
SAMPLE_RATE_HERTZ：要轉錄音訊的取樣率 (單位為赫茲)。
ENABLE_WORD_TIME_OFFSETS：如要傳回字詞的開始和結束時間偏移 (時間戳記)，請啟用這個欄位。
INPUT_AUDIO：要轉錄的音訊資料，採用 Base64 編碼的字串。
PROJECT_ID：專案的英數字元 ID。 Google Cloud

HTTP 方法和網址：

POST https://speech.googleapis.com/v1/speech:recognize

JSON 要求主體：

{
  "config": {
      "languageCode": "LANGUAGE_CODE",
      "encoding": "ENCODING",
      "sampleRateHertz": SAMPLE_RATE_HERTZ,
      "enableWordTimeOffsets": ENABLE_WORD_TIME_OFFSETS
  },
  "audio": {
    "content": "INPUT_AUDIO"
  }
}

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://speech.googleapis.com/v1/speech:recognize"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://speech.googleapis.com/v1/speech:recognize" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

gcloud

如要瞭解完整的詳細資訊，請參閱 recognize 指令。

如要對本機檔案執行語音辨識，請使用 Google Cloud CLI，並傳遞要執行語音辨識的檔案本機檔案路徑。

gcloud ml speech recognize PATH-TO-LOCAL-FILE --language-code='en-US'

如果要求成功，伺服器會傳回 JSON 格式的回應：

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9840146,
          "transcript": "how old is the Brooklyn Bridge"
        }
      ]
    }
  ]
}

Go

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Go API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。


func recognize(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := speech.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	data, err := os.ReadFile(file)
	if err != nil {
		return err
	}

	// Send the contents of the audio file with the encoding and
	// and sample rate information to be transcripted.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Content{Content: data},
		},
	})

	// Print the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Fprintf(w, "\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
	return nil
}

Java

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Java API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

/**
 * Performs speech recognition on raw PCM audio and prints the transcription.
 *
 * @param fileName the path to a PCM audio file to transcribe.
 */
public static void syncRecognizeFile(String fileName) throws Exception {
  try (SpeechClient speech = SpeechClient.create()) {
    Path path = Paths.get(fileName);
    byte[] data = Files.readAllBytes(path);
    ByteString audioBytes = ByteString.copyFrom(data);

    // Configure request with local raw PCM audio
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setLanguageCode("en-US")
            .setSampleRateHertz(16000)
            .build();
    RecognitionAudio audio = RecognitionAudio.newBuilder().setContent(audioBytes).build();

    // Use blocking call to get audio transcript
    RecognizeResponse response = speech.recognize(config, audio);
    List<SpeechRecognitionResult> results = response.getResultsList();

    for (SpeechRecognitionResult result : results) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
      System.out.printf("Transcription: %s%n", alternative.getTranscript());
    }
  }
}

Node.js

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Node.js API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

// Imports the Google Cloud client library
const fs = require('fs');
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const filename = 'Local path to audio file, e.g. /path/to/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};
const audio = {
  content: fs.readFileSync(filename).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file
const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log('Transcription: ', transcription);

Python

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Python API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

from google.cloud import speech


def transcribe_file(audio_file: str) -> speech.RecognizeResponse:
    """Transcribe the given audio file.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response containing the transcription results
    """
    client = speech.SpeechClient()

    with open(audio_file, "rb") as f:
        audio_content = f.read()

    audio = speech.RecognitionAudio(content=audio_content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    response = client.recognize(config=config, audio=audio)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

其他語言

C#：請按照用戶端程式庫頁面的 C# 設定說明操作，然後前往 .NET 適用的 Speech-to-Text 參考說明文件。

PHP：請按照用戶端程式庫頁面的 PHP 設定說明操作，然後前往 PHP 適用的 Speech-to-Text 參考文件。

Ruby：請按照用戶端程式庫頁面的Ruby 設定說明操作，然後前往 Ruby 適用的 Speech-to-Text 參考說明文件。

對遠端檔案執行同步語音辨識

為方便起見，Speech-to-Text API 可以對位於 Google Cloud Storage 中的音訊檔案直接執行同步語音辨識，您無需在要求內容中傳送音訊檔案的內容。

以下是對位於 Cloud Storage 中的檔案執行同步語音辨識的範例：

REST

如要瞭解完整的詳細資訊，請參閱 speech:recognize API 端點。如要進一步瞭解如何設定要求內容，請參閱 RecognitionConfig 參考說明文件。

使用任何要求資料之前，請先替換以下項目：

LANGUAGE_CODE：音訊片段中使用的語言 BCP-47 代碼。
ENCODING：要轉錄的音訊編碼。
SAMPLE_RATE_HERTZ：要轉錄音訊的取樣率 (赫茲)。
ENABLE_WORD_TIME_OFFSETS：如要傳回字詞的開始和結束時間偏移 (時間戳記)，請啟用這個欄位。
STORAGE_BUCKET：Cloud Storage bucket。
INPUT_AUDIO：要轉錄的音訊資料檔案。
PROJECT_ID：專案的英數字元 ID。 Google Cloud

HTTP 方法和網址：

POST https://speech.googleapis.com/v1/speech:recognize

JSON 要求主體：

{
  "config": {
      "languageCode": "LANGUAGE_CODE",
      "encoding": "ENCODING",
      "sampleRateHertz": SAMPLE_RATE_HERTZ,
      "enableWordTimeOffsets": ENABLE_WORD_TIME_OFFSETS
  },
  "audio": {
    "uri": "gs://STORAGE_BUCKET/INPUT_AUDIO"
  }
}

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://speech.googleapis.com/v1/speech:recognize"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://speech.googleapis.com/v1/speech:recognize" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

gcloud

如要瞭解完整的詳細資訊，請參閱 recognize 指令。

如要對本機檔案執行語音辨識，請使用 Google Cloud CLI，並傳遞要執行語音辨識的檔案本機檔案路徑。

gcloud ml speech recognize 'gs://cloud-samples-tests/speech/brooklyn.flac' \
--language-code='en-US'

如果要求成功，伺服器會傳回 JSON 格式的回應：

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9840146,
          "transcript": "how old is the Brooklyn Bridge"
        }
      ]
    }
  ]
}

Go

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Go API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。


func recognizeGCS(w io.Writer, gcsURI string) error {
	ctx := context.Background()

	client, err := speech.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	// Send the request with the URI (gs://...)
	// and sample rate information to be transcripted.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Uri{Uri: gcsURI},
		},
	})

	// Print the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Fprintf(w, "\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
	return nil
}

Java

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Java API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

/**
 * Performs speech recognition on remote FLAC file and prints the transcription.
 *
 * @param gcsUri the path to the remote FLAC audio file to transcribe.
 */
public static void syncRecognizeGcs(String gcsUri) throws Exception {
  // Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
  try (SpeechClient speech = SpeechClient.create()) {
    // Builds the request for remote FLAC file
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.FLAC)
            .setLanguageCode("en-US")
            .setSampleRateHertz(16000)
            .build();
    RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

    // Use blocking call for getting audio transcript
    RecognizeResponse response = speech.recognize(config, audio);
    List<SpeechRecognitionResult> results = response.getResultsList();

    for (SpeechRecognitionResult result : results) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
      System.out.printf("Transcription: %s%n", alternative.getTranscript());
    }
  }
}

Node.js

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Node.js API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};
const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file
const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log('Transcription: ', transcription);

Python

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫，請參閱這篇文章。詳情請參閱 Speech-to-Text Python API 參考說明文件。

如要向語音轉文字服務進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

def transcribe_gcs(audio_uri: str) -> speech.RecognizeResponse:
    """Transcribes the audio file specified by the gcs_uri.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://cloud-samples-data/speech/audio.flac
    Returns:
        cloud_speech.RecognizeResponse: The response containing the transcription results
    """
    from google.cloud import speech

    client = speech.SpeechClient()

    audio = speech.RecognitionAudio(uri=audio_uri)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    response = client.recognize(config=config, audio=audio)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

其他語言

C#：請按照用戶端程式庫頁面的 C# 設定說明操作，然後前往 .NET 適用的 Speech-to-Text 參考說明文件。

PHP：請按照用戶端程式庫頁面的 PHP 設定說明操作，然後前往 PHP 適用的 Speech-to-Text 參考文件。

Ruby：請按照用戶端程式庫頁面的Ruby 設定說明操作，然後前往 Ruby 適用的 Speech-to-Text 參考說明文件。

轉錄短音訊檔案 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

對本機檔案執行同步語音辨識

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Go

Java

Node.js

Python

其他語言

對遠端檔案執行同步語音辨識

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Go

Java

Node.js

Python

其他語言

轉錄短音訊檔案