Contains a speech recognition result corresponding to a portion of the
audio that is currently being processed or an indication that this is
the end of the single requested utterance. Example: 1. transcript:
“tube” 2. transcript: “to be a” 3. transcript: “to be” 4.
transcript: “to be or not to be” is_final: true 5. transcript: "
that’s" 6. transcript: " that is" 7. message_type:
END_OF_SINGLE_UTTERANCE
8. transcript: " that is the question"
is_final: true Only two of the responses contain final results (#4
and #8 indicated by is_final: true
). Concatenating these generates
the full transcript: “to be or not to be that is the question”. In
each response we populate: - for TRANSCRIPT
: transcript
and
possibly is_final
. - for END_OF_SINGLE_UTTERANCE
: only
message_type
.
Transcript text representing the words that the user spoke.
Populated if and only if message_type
= TRANSCRIPT
.
The Speech confidence between 0.0 and 1.0 for the current
portion of audio. A higher number indicates an estimated
greater likelihood that the recognized words are correct. The
default of 0.0 is a sentinel value indicating that confidence
was not set. This field is typically only provided if
is_final
is true and you should not rely on it being
accurate or even set.
Time offset of the end of this Speech recognition result
relative to the beginning of the audio. Only populated for
message_type
= TRANSCRIPT
.