Stay organized with collections
Save and categorize content based on your preferences.
You can specify that Speech-to-Text indicate a value of accuracy,
or confidence level, for
individual words in a transcription.
Word-level confidence
When the Speech-to-Text transcribes an audio clip, it also
measures the degree of accuracy for the response. The response
sent from Speech-to-Text states the confidence level for
the entire transcription request as a number between 0.0 and 1.0.
The following code sample shows an example of the confidence level
value returned by Speech-to-Text.
{
"results": [
{
"alternatives": [
{
"transcript": "how old is the Brooklyn Bridge",
"confidence": 0.96748614
}
]
}
]
}
In addition to the confidence level of the entire transcription,
Speech-to-Text can also provide the confidence level of
individual words within the transcription. The response then
includes WordInfo details in the transcription,
indicating the confidence level for individual words as shown in the
following example.
{
"results": [
{
"alternatives": [
{
"transcript": "how old is the Brooklyn Bridge",
"confidence": 0.98360395,
"words": [
{
"startOffset": "0s",
"endOffset": "0.300s",
"word": "how",
"confidence": SOME NUMBER
},
...
]
}
]
}
]
}
Enable word-level confidence in a request
The following code snippet demonstrates how to enable word-level
confidence in a transcription request to Speech-to-Text using local and remote files.
Use a local file
Protocol
Refer to the speech:recognize
API endpoint for complete details.
To perform synchronous speech recognition, make a POST request and provide the
appropriate request body. The following shows an example of a POST request using
curl. The example uses the Google Cloud CLI to generate an access
token. For instructions on installing the gcloud CLI,
see the quickstart.
The following example show how to send a POST request using curl,
where the body of the request enables word-level confidence.
If the request is successful, the server returns a 200 OK HTTP
status code and the response in JSON format, saved to a file
named word-level-confidence.txt.
fromgoogle.cloudimportspeech_v1p1beta1asspeechclient=speech.SpeechClient()speech_file="resources/Google_Gnome.wav"withopen(speech_file,"rb")asaudio_file:content=audio_file.read()audio=speech.RecognitionAudio(content=content)config=speech.RecognitionConfig(encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code="en-US",enable_word_confidence=True,)response=client.recognize(config=config,audio=audio)fori,resultinenumerate(response.results):alternative=result.alternatives[0]print("-"*20)print(f"First alternative of result {i}")print(f"Transcript: {alternative.transcript}")print("First Word and Confidence: ({}, {})".format(alternative.words[0].word,alternative.words[0].confidence))returnresponse.results
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,["# Enable word-level confidence\n\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nYou can specify that Speech-to-Text indicate a value of accuracy,\nor [confidence level](/speech-to-text/v2/docs/basics#confidence-values), for\nindividual words in a transcription.\n\nWord-level confidence\n---------------------\n\nWhen the Speech-to-Text transcribes an audio clip, it also\nmeasures the degree of accuracy for the response. The response\nsent from Speech-to-Text states the confidence level for\nthe entire transcription request as a number between 0.0 and 1.0.\nThe following code sample shows an example of the confidence level\nvalue returned by Speech-to-Text. \n\n```\n{\n \"results\": [\n {\n \"alternatives\": [\n {\n \"transcript\": \"how old is the Brooklyn Bridge\",\n \"confidence\": 0.96748614\n }\n ]\n }\n ]\n}\n```\n\nIn addition to the confidence level of the entire transcription,\nSpeech-to-Text can also provide the confidence level of\nindividual words within the transcription. The response then\nincludes [`WordInfo`](/speech-to-text/v2/docs/reference/rest/v2/projects.locations.recognizers/recognize#wordinfo) details in the transcription,\nindicating the confidence level for individual words as shown in the\nfollowing example. \n\n```\n{\n \"results\": [\n {\n \"alternatives\": [\n {\n \"transcript\": \"how old is the Brooklyn Bridge\",\n \"confidence\": 0.98360395,\n \"words\": [\n {\n \"startOffset\": \"0s\",\n \"endOffset\": \"0.300s\",\n \"word\": \"how\",\n \"confidence\": SOME NUMBER\n },\n ...\n ]\n }\n ]\n }\n ]\n}\n```\n\nEnable word-level confidence in a request\n-----------------------------------------\n\nThe following code snippet demonstrates how to enable word-level\nconfidence in a transcription request to Speech-to-Text using local and remote files.\n\n### Use a local file\n\n### Protocol\n\nRefer to the [`speech:recognize`](/speech-to-text/v2/docs/reference/rest/v2/projects.locations.recognizers/recognize)\nAPI endpoint for complete details.\n\n\nTo perform synchronous speech recognition, make a `POST` request and provide the\nappropriate request body. The following shows an example of a `POST` request using\n`curl`. The example uses the [Google Cloud CLI](/sdk) to generate an access\ntoken. For instructions on installing the gcloud CLI,\nsee the [quickstart](/speech-to-text/docs/transcribe-api).\n\nThe following example show how to send a `POST` request using `curl`,\nwhere the body of the request enables word-level confidence. \n\n```bash\ncurl -s -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer $(gcloud auth application-default print-access-token)\" \\\n https://speech.googleapis.com/v2/projects/{project}/locations/global/recognizers/{recognizers}:recognize \\\n --data '{\n \"config\": {\n \"features\": {\n \"enableWordTimeOffsets\": true,\n \"enableWordConfidence\": true\n }\n },\n \"uri\": \"gs://cloud-samples-tests/speech/brooklyn.flac\"\n}' \u003e word-level-confidence.txt\n```\n\nIf the request is successful, the server returns a `200 OK` HTTP\nstatus code and the response in JSON format, saved to a file\nnamed `word-level-confidence.txt`. \n\n```\n{\n \"results\": [\n {\n \"alternatives\": [\n {\n \"transcript\": \"how old is the Brooklyn Bridge\",\n \"confidence\": 0.98360395,\n \"words\": [\n {\n \"startTime\": \"0s\",\n \"endTime\": \"0.300s\",\n \"word\": \"how\",\n \"confidence\": 0.98762906\n },\n {\n \"startTime\": \"0.300s\",\n \"endTime\": \"0.600s\",\n \"word\": \"old\",\n \"confidence\": 0.96929157\n },\n {\n \"startTime\": \"0.600s\",\n \"endTime\": \"0.800s\",\n \"word\": \"is\",\n \"confidence\": 0.98271006\n },\n {\n \"startTime\": \"0.800s\",\n \"endTime\": \"0.900s\",\n \"word\": \"the\",\n \"confidence\": 0.98271006\n },\n {\n \"startTime\": \"0.900s\",\n \"endTime\": \"1.100s\",\n \"word\": \"Brooklyn\",\n \"confidence\": 0.98762906\n },\n {\n \"startTime\": \"1.100s\",\n \"endTime\": \"1.500s\",\n \"word\": \"Bridge\",\n \"confidence\": 0.98762906\n }\n ]\n }\n ],\n \"languageCode\": \"en-us\"\n }\n ]\n}\n```\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.cloud import speech_v1p1beta1 as speech\n\n client = speech.SpeechClient()\n\n speech_file = \"resources/Google_Gnome.wav\"\n\n with open(speech_file, \"rb\") as audio_file:\n content = audio_file.read()\n\n audio = speech.RecognitionAudio(content=content)\n\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,\n sample_rate_hertz=16000,\n language_code=\"en-US\",\n enable_word_confidence=True,\n )\n\n response = client.recognize(config=config, audio=audio)\n\n for i, result in enumerate(response.results):\n alternative = result.alternatives[0]\n print(\"-\" * 20)\n print(f\"First alternative of result {i}\")\n print(f\"Transcript: {alternative.transcript}\")\n print(\n \"First Word and Confidence: ({}, {})\".format(\n alternative.words[0].word, alternative.words[0].confidence\n )\n )\n\n return response.results\n\n\u003cbr /\u003e"]]