[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[],[],null,["# Best practices\n\nThis document contains recommendations on how to provide speech data to the\nSpeech-to-Text API. These guidelines are designed for greater efficiency\nand accuracy as well as reasonable response times from the service. Use of the\nSpeech-to-Text API works best when data sent to the service is within the parameters\ndescribed in this document.\n\nIf you follow these guidelines and don't get the results you expect from the\nAPI, see [Troubleshooting \\& Support](/speech-to-text/docs/support).\n\nSampling rate\n-------------\n\nIf possible, set the sampling rate of the audio source to 16000 Hz.\n\nFor headerless codecs, use the\n[explicit_decoding_config](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#explicitdecodingconfig)\nsetting in\n[RecognitionConfig](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#recognitionconfig) to set sample_rate_hertz to match the native sample rate of the audio source\n(instead of re-sampling).\n\nFor codecs with a header, use the\n[auto_decoding_config](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#autodetectdecodingconfig)\nsetting in\n[RecognitionConfig](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#recognitionconfig) to automatically choose the correct sampling rate.\n\nFrame size\n----------\n\nStreaming recognition recognizes live audio as it is captured from a microphone\nor other audio source. The audio stream is split into frames and sent in\nconsecutive `StreamingRecognizeRequest` messages. Any frame size is acceptable.\nLarger frames are more efficient, but add latency. A 100-millisecond frame size\nis recommended as a good tradeoff between latency and efficiency.\n\nAudio preprocessing\n-------------------\n\nIt's best to provide audio that is as clean as possible by using a good quality\nand well-positioned microphone. However, applying noise-reduction signal\nprocessing to the audio before sending it to the service typically reduces\nrecognition accuracy. The service is designed to handle noisy audio.\n\nFor best results:\n\n- Position the microphone as close as possible to the person that is speaking, particularly when background noise is present.\n- Avoid audio clipping.\n- Do not use automatic gain control (AGC).\n- All noise reduction processing should be disabled.\n- Listen to some sample audio. It should sound clear, without distortion or unexpected noise.\n\nRequest configuration\n---------------------\n\nMake sure that you accurately describe the audio data sent with your request\nto the Speech-to-Text API. Ensuring that the\n[RecognitionConfig](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#recognitionconfig)\nfor your request describes the correct `sampleRateHertz`, `encoding`, and that\nyou are using a\n[Recognizer](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#recognizer)\nwith the correct `language_codes` and `model` will result in the most accurate\ntranscription and billing for your request.\n\nWhat's next\n-----------\n\n- Use [client libraries](/speech-to-text/v2/docs/transcribe-client-libraries) to transcribe audio using your favorite programming language.\n- Practice [transcribing short audio files](/speech-to-text/v2/docs/sync-recognize).\n- Learn how to [transcribe streaming audio](/speech-to-text/v2/docs/streaming-recognize).\n- Learn how to [transcribe long audio files](/speech-to-text/v2/docs/batch-recognize)."]]