在本快速入门中,您将了解如何衡量和提高 Google Cloud Speech-to-Text 处理音频数据的准确率。此外,您还可以探索 API 中提供的各种模型和选项,用于提高转写准确率。您将探索如何使用 Google Cloud 控制台中的 Speech-to-Text 界面和标准答案文件来衡量准确率并深入了解 Speech-to-Text 系统。
确保您已注册 Google Cloud 账号并创建项目。1. 前往 Google Cloud 控制台中的“语音”,然后进入 Speech-to-Text 界面。2. 借助在听觉上展示您的使用场景以及您打算如何使用 ASR 系统的音频文件,按照快速入门说明使用 Speech-to-Text 进行第一次转写。
以 gs://cloud-samples-data/speech/brooklyn_bridge.wav 为例。标准答案文件包含:How old is the Brooklyn Bridge。如果您没有可用的标准答案文件,建议您以文本格式下载转写内容。根据需要修改转写文件。上传转写文件作为标准答案文件。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[],[],null,["# Measure and improve accuracy\n\nIn this quickstart, learn how to measure and improve the accuracy of the Google Cloud Speech-to-Text for your audio data. Also explore the various models and options available from the API to enhance transcription accuracy. Explore how to use the Speech-to-Text UI in the Google Cloud console and a ground-truth file to measure accuracy and to gain insights into the Speech-to-Text system.\n\nMachine Learning (ML) systems are inherently subject to inaccuracies, and Automatic Speech Recognition (ASR) systems, also known as Speech-to-Text systems, are no exception. Accurate measurement of accuracy is strongly coupled to specific use cases and the systems being evaluated, as differences in audio recording quality and acoustic conditions can significantly impact accuracy. As a result, a singular accuracy score for all customers and use cases is impractical. To ensure reliable performance of ASR systems in critical production-facing systems performance. It is also essential to understand how Speech-to-Text performs within the broader context of your system.\n\nFor the purposes of this quickstart guide,use the industry standard method for comparison, [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), often abbreviated as WER. For more information on how WER is calculated and interpreted see [Measure and improve speech accuracy](/speech-to-text/docs/speech-accuracy). Let's start.\n\nGetting started with Speech-to-Text Console\n-------------------------------------------\n\n#### Permissions required for this task\n\nTo perform this task, you must have the following\n[permissions](/iam/docs/overview#permissions):\n\n\n- `storage.buckets.get`\n- `storage.buckets.list`\n\nAt the project or bucket level:\n\n- `storage.objects.create`\n- `storage.objects.get`\n- `storage.objects.list`\n- `storage.objects.update`\n\nEnsure you have signed up for a Google Cloud account and created a project.\n1. Go to Speech in Google Cloud console, and navigate to [Speech-to-Text UI](https://console.cloud.google.com/speech).\n2. Using an audio file that is acoustically representative of your use case and how you are planning to use the ASR system, follow the quickstart instructions for making your first transcription using the [Speech-to-Text](https://cloud.google.com/speech-to-text/docs/transcribe-console).\n\nCalculating Transcription Accuracy\n----------------------------------\n\n1. After you have successfully transcribed your audio file, use the `Transcription Accuracy` section. This section remains empty until accuracy is calculated for your transcription.\n2. Using the **Upload Ground Truth** button at the top of the section, you can begin calculating accuracy.\n\nSpecifying ground truth\n-----------------------\n\n1. To calculate the accuracy of the transcription, provide a ground truth file. This is a `.txt` or `.csv` file, usually a human-generated transcription file that contains the correct or expected transcriptions for comparison.\n2. Using `gs://cloud-samples-data/speech/brooklyn_bridge.wav` as an example. The ground truth file contains: `How old is the Brooklyn Bridge`. If you don't have a ground truth file available, a recommendation is to download the transcription in a text format. Edit the transcription file as needed. Upload the transcription file as the ground truth file.\n3. Using **Upload** or an existing Cloud Storage file, specify the ground truth file, and click **Save** .\n\nConfirming ground truth\n-----------------------\n\n1. After clicking **Save**, a prompt displays to confirm that the specified ground truth file is correct. Verify that the ground truth file accurately represents the correct transcriptions, as it directly affects the accuracy metrics.\n2. Click **Confirm** to proceed.\n\nReview evaluation results\n-------------------------\n\n1. Depending on the size of the input data, the evaluation process might take some time, and the results are displayed upon completion.\n2. Once the evaluation is complete, the following sections are displayed:\n - The **Transcription Accuracy** table, the accuracy metrics, and a link to the ground truth file that were used in the process.\n - The `Transcription` with a toggle for comparing to the ground truth file along with a breakdown of accuracy metrics and highlights.\n3. Review and interpret the accuracy results to understand the performance of the Speech-to-Text recognizer that are used to identify areas for improvement, as the results vary depending on the inputs and transcription used. In the following examples, you can see indicative cases of the accuracy results, which provide valuable insights for optimization of the Google Cloud Speech-to-Text system.\n - An example of 0% WER:\n - An example of 40% WER:\n\nOptional: updating ground truth\n-------------------------------\n\nYou can test a different ground truth against the existing transcription, by reattaching a different file and then repeating steps three and four with an updated ground truth file.\n\nTry it for yourself\n-------------------\n\n\nIf you're new to Google Cloud, create an account to evaluate how\nSpeech-to-Text performs in real-world\nscenarios. New customers also get $300 in free credits to run, test, and\ndeploy workloads.\n[Try Speech-to-Text free](https://console.cloud.google.com/freetrial)"]]