Stay organized with collections
Save and categorize content based on your preferences.
Speech-to-Text provides the latest_short model for recognizing speech that
consists of single utterances. This may be useful for applications where users
are issuing single voice commands as opposed to long-form monologue or
dictation.
When a recognizer with the latest_short model is used for a recognition
request, Speech-to-Text will stop performing recognition once it detects
an utterance has finished. Speech-to-Text will return a speech activity
event response with the type END_OF_SINGLE_UTTERANCE followed by the
transcription results.
Single utterance and StreamingRecognize
In the case where a latest_short model Recognizer
is selected for a StreamingRecognize request, Speech-to-Text will close
the stream automatically after the utterance has ended.
With voice activity events
In the case where voice activity events have also been enabled for a
StreamingRecognize request, Speech-to-Text will still return speech
begin/end voice activity events.
Voice activity timeouts for speech begin will still be applied. Voice activity
timeouts for speech end will not be applied, since the stream will be closed as
soon as the utterance ends.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Single utterance behavior\n\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nSpeech-to-Text provides the `latest_short` model for recognizing speech that\nconsists of single utterances. This may be useful for applications where users\nare issuing single voice commands as opposed to long-form monologue or\ndictation.\n\nWhen a recognizer with the `latest_short` model is used for a recognition\nrequest, Speech-to-Text will stop performing recognition once it detects\nan utterance has finished. Speech-to-Text will return a speech activity\nevent response with the type `END_OF_SINGLE_UTTERANCE` followed by the\ntranscription results.\n\nSingle utterance and StreamingRecognize\n---------------------------------------\n\nIn the case where a `latest_short` model [`Recognizer`](/speech-to-text/v2/docs/reference/rest/v2/projects.locations.recognizers#Recognizer)\nis selected for a `StreamingRecognize` request, Speech-to-Text will close\nthe stream automatically after the utterance has ended.\n\n### With voice activity events\n\nIn the case where voice activity events have also been enabled for a\nStreamingRecognize request, Speech-to-Text will still return speech\nbegin/end voice activity events.\nVoice activity timeouts for speech begin will still be applied. Voice activity\ntimeouts for speech end will not be applied, since the stream will be closed as\nsoon as the utterance ends."]]