Contains a speech recognition result corresponding to a portion of the audio that is currently being processed or an indication that this is the end of the single requested utterance.
Example:
transcript: "tube"
transcript: "to be a"
transcript: "to be"
transcript: "to be or not to be" is_final: true
transcript: " that's"
transcript: " that is"
message_type:
END_OF_SINGLE_UTTERANCE
transcript: " that is the question" is_final: true
Only two of the responses contain final results (#4 and #8 indicated by
is_final: true
). Concatenating these generates the full transcript:
"to be or not to be that is the question".
In each response we populate:
for
TRANSCRIPT
:transcript
and possiblyis_final
.for
END_OF_SINGLE_UTTERANCE
: onlymessage_type
.Transcript text representing the words that the user spoke. Populated if and only if
message_type
=TRANSCRIPT
.The Speech confidence between 0.0 and 1.0 for the current portion of audio. A higher number indicates an estimated greater likelihood that the recognized words are correct. The default of 0.0 is a sentinel value indicating that confidence was not set. This field is typically only provided if
is_final
is true and you should not rely on it being accurate or even set.Time offset of the end of this Speech recognition result relative to the beginning of the audio. Only populated for
message_type
=TRANSCRIPT
.