양방향 스트리밍으로 음성 합성

이 문서에서는 양방향 스트리밍을 사용하여 오디오를 합성하는 과정을 안내합니다.

양방향 스트리밍을 사용하면 텍스트 입력을 전송하고 동시에 오디오 데이터를 수신할 수 있습니다. 즉, 전체 입력 텍스트가 전송되기 전에 음성 합성을 시작할 수 있으므로, 지연 시간을 줄이고 실시간 상호작용을 지원할 수 있습니다. 음성 어시스턴트와 대화형 게임은 양방향 스트리밍을 이용해서 보다 역동적이고 응답성이 뛰어난 애플리케이션을 만듭니다.

Text-to-Speech의 기본 개념에 대한 자세한 내용은 Text-to-Speech 기본 사항을 참조하세요.

시작하기 전에

Text-to-Speech API에 요청을 보내려면 먼저 다음 작업을 완료해야 합니다. 자세한 내용은 시작하기 전에 페이지를 참조하세요.

Google Cloud 프로젝트에서 Text-to-Speech 사용 설정
1. Text-to-Speech에 결제가 사용 설정되었는지 확인하기
Google Cloud CLI를 설치합니다. 설치 후 다음 명령어를 실행하여 Google Cloud CLI를 초기화합니다.
```
gcloud init
```
외부 ID 공급업체(IdP)를 사용하는 경우 먼저 제휴 ID로 gcloud CLI에 로그인해야 합니다.

양방향 스트리밍으로 음성 합성

클라이언트 라이브러리 설치

Python

라이브러리를 설치하기 전에 Python 개발을 위한 환경이 준비됐는지 확인하세요.

pip install --upgrade google-cloud-texttospeech

텍스트 스트림을 전송하고 오디오 스트림 수신

API는 StreamingSynthesisInput 또는 StreamingSynthesizeConfig가 포함된 StreamingSynthesizeRequest 유형의 요청 스트림을 수락합니다.

텍스트 입력을 제공하는 StreamingSynthesisInput으로 StreamingSynthesizeRequest 스트림을 전송하기 전에 StreamingSynthesizeConfig가 포함된 StreamingSynthesizeRequest를 정확히 하나만 전송합니다.

Text-to-Speech 스트리밍은 Chirp 3: HD 음성만 호환됩니다.

Python

예시를 실행하기 전에 Python 개발 환경이 준비됐는지 확인합니다.

#!/usr/bin/env python
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

"""Google Cloud Text-To-Speech API streaming sample application .

Example usage:
    python streaming_tts_quickstart.py
"""


def run_streaming_tts_quickstart():
    """Synthesizes speech from a stream of input text."""
    from google.cloud import texttospeech

    client = texttospeech.TextToSpeechClient()

    # See https://cloud.google.com/text-to-speech/docs/voices for all voices.
    streaming_config = texttospeech.StreamingSynthesizeConfig(
        voice=texttospeech.VoiceSelectionParams(
            name="en-US-Chirp3-HD-Charon",
            language_code="en-US",
        )
    )

    # Set the config for your stream. The first request must contain your config, and then each subsequent request must contain text.
    config_request = texttospeech.StreamingSynthesizeRequest(
        streaming_config=streaming_config
    )

    text_iterator = [
        "Hello there. ",
        "How are you ",
        "today? It's ",
        "such nice weather outside.",
    ]

    # Request generator. Consider using Gemini or another LLM with output streaming as a generator.
    def request_generator():
        yield config_request
        for text in text_iterator:
            yield texttospeech.StreamingSynthesizeRequest(
                input=texttospeech.StreamingSynthesisInput(text=text)
            )

    streaming_responses = client.streaming_synthesize(request_generator())

    for response in streaming_responses:
        print(f"Audio content size in bytes is: {len(response.audio_content)}")


if __name__ == "__main__":
    run_streaming_tts_quickstart()

삭제

불필요한 Google Cloud Platform 요금이 부과되지 않도록 하려면Google Cloud console 을 사용하여 필요하지 않은 프로젝트를 삭제하세요.

다음 단계

기본 사항을 읽으면서 Cloud Text-to-Speech에 대해 자세히 알아보기
합성 음성에 사용 가능한 음성 목록 검토.