Menyintesis ucapan dengan streaming dua arah

Dokumen ini akan memandu Anda melalui proses sintesis audio menggunakan streaming dua arah.

Streaming dua arah memungkinkan Anda mengirim input teks dan menerima data audio secara bersamaan. Artinya, Anda dapat mulai menyintesis ucapan sebelum teks input lengkap dikirim, sehingga mengurangi latensi dan memungkinkan interaksi secara real-time. Asisten suara dan game interaktif menggunakan streaming dua arah untuk membuat aplikasi yang lebih dinamis dan responsif.

Untuk mempelajari lebih lanjut konsep dasar dalam Text-to-Speech, baca Dasar-Dasar Text-to-Speech.

Sebelum memulai

Sebelum dapat mengirim permintaan ke Text-to-Speech API, Anda harus menyelesaikan tindakan berikut terlebih dahulu. Lihat halaman sebelum memulai untuk mengetahui detailnya.

Aktifkan Text-to-Speech di project Google Cloud .
1. Pastikan penagihan diaktifkan untuk Text-to-Speech.
Instal Google Cloud CLI, lalu login ke gcloud CLI dengan identitas terpadu Anda. Setelah login, lakukan inisialisasi Google Cloud CLI dengan menjalankan perintah berikut:
```
gcloud init
```

Menyintesis ucapan dengan streaming dua arah

Menginstal library klien

Python

Sebelum menginstal library, pastikan Anda telah menyiapkan lingkungan untuk pengembangan Python.

pip install --upgrade google-cloud-texttospeech

Mengirimkan aliran teks dan menerima aliran audio

API menerima aliran permintaan dengan jenis StreamingSynthesizeRequest, yang berisi StreamingSynthesisInput atau StreamingSynthesizeConfig.

Sebelum mengirimkan aliran StreamingSynthesizeRequest dengan StreamingSynthesisInput, yang menyediakan input teks, kirimkan tepat satu StreamingSynthesizeRequest dengan StreamingSynthesizeConfig.

Text-to-Speech Streaming hanya kompatibel dengan Chirp 3: Suara HD.

Python

Sebelum menjalankan contoh, pastikan Anda telah menyiapkan lingkungan untuk pengembangan Python.

#!/usr/bin/env python
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

"""Google Cloud Text-To-Speech API streaming sample application .

Example usage:
    python streaming_tts_quickstart.py
"""


def run_streaming_tts_quickstart():
    """Synthesizes speech from a stream of input text."""
    from google.cloud import texttospeech

    client = texttospeech.TextToSpeechClient()

    # See https://cloud.google.com/text-to-speech/docs/voices for all voices.
    streaming_config = texttospeech.StreamingSynthesizeConfig(
        voice=texttospeech.VoiceSelectionParams(
            name="en-US-Chirp3-HD-Charon",
            language_code="en-US",
        )
    )

    # Set the config for your stream. The first request must contain your config, and then each subsequent request must contain text.
    config_request = texttospeech.StreamingSynthesizeRequest(
        streaming_config=streaming_config
    )

    text_iterator = [
        "Hello there. ",
        "How are you ",
        "today? It's ",
        "such nice weather outside.",
    ]

    # Request generator. Consider using Gemini or another LLM with output streaming as a generator.
    def request_generator():
        yield config_request
        for text in text_iterator:
            yield texttospeech.StreamingSynthesizeRequest(
                input=texttospeech.StreamingSynthesisInput(text=text)
            )

    streaming_responses = client.streaming_synthesize(request_generator())

    for response in streaming_responses:
        print(f"Audio content size in bytes is: {len(response.audio_content)}")


if __name__ == "__main__":
    run_streaming_tts_quickstart()

Pembersihan

Untuk menghindari tagihan Google Cloud Platform yang tidak diinginkan, gunakan Google Cloud console untuk menghapus project Anda jika tidak lagi diperlukan.

Langkah berikutnya

Pelajari lebih lanjut Cloud Text-to-Speech dengan membaca dasar-dasarnya.
Tinjau daftar suara yang tersedia yang dapat Anda gunakan untuk ucapan sintetis.