Time offset relative to the beginning of the audio, and
corresponding to the end of the spoken word. This field is
only set if enable_word_time_offsets=true and only in the
top hypothesis. This is an experimental feature and the
accuracy of the time offset can vary.
The confidence estimate between 0.0 and 1.0. A higher number
indicates an estimated greater likelihood that the recognized
words are correct. This field is set only for the top
alternative of a non-streaming result or, of a streaming
result where is_final=true. This field is not guaranteed
to be accurate and users should not rely on it to be always
provided. The default of 0.0 is a sentinel value indicating
confidence was not set.