Instant Custom Voice in Text-to-Speech enables users to create personalized voice models by training a model with their own high-quality audio recordings. It allows for the rapid generation of personal voices, which can then be used to synthesize audio using the Cloud TTS API, supporting both streaming and long-form text.
Due to safety considerations, access to this voice cloning capability is restricted to allow-listed users. To access this feature, contact a member of the sales team to be added to the allow list.
![]() |
![]() |
Language availability
Instant Custom Voice creation and synthesis is supported in the following languages:
Language | BCP-47 Code | Consent Statement |
---|---|---|
Arabic (XA) | ar-XA | .أنا مالك هذا الصوت وأوافق على أن تستخدم Google هذا الصوت لإنشاء نموذج صوتي اصطناعي |
Bengali (India) | bn-IN | আমি এই ভয়েসের মালিক এবং আমি একটি সিন্থেটিক ভয়েস মডেল তৈরি করতে এই ভয়েস ব্যবহার করে Google-এর সাথে সম্মতি দিচ্ছি। |
Chinese (China) | cmn-CN | 我是此声音的拥有者并授权谷歌使用此声音创建语音合成模型 |
German (Germany) | de-DE | Ich bin der Eigentümer dieser Stimme und bin damit einverstanden, dass Google diese Stimme zur Erstellung eines synthetischen Stimmmodells verwendet. |
English (Australia) | en-AU | I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model. |
English (UK) | en-GB | I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model. |
English (India) | en-IN | I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model. |
English (US) | en-US | I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model. |
Spanish (Spain) | es-ES | Soy el propietario de esta voz y doy mi consentimiento para que Google la utilice para crear un modelo de voz sintética. |
Spanish (US) | es-US | Soy el propietario de esta voz y doy mi consentimiento para que Google la utilice para crear un modelo de voz sintética. |
French (Canada) | fr-CA | Je suis le propriétaire de cette voix et j'autorise Google à utiliser cette voix pour créer un modèle de voix synthétique. |
French (France) | fr-FR | Je suis le propriétaire de cette voix et j'autorise Google à utiliser cette voix pour créer un modèle de voix synthétique. |
Gujarati (India) | gu-IN | હું આ વોઈસનો માલિક છું અને સિન્થેટિક વોઈસ મોડલ બનાવવા માટે આ વોઈસનો ઉપયોગ કરીને google ને હું સંમતિ આપું છું |
Hindi (India) | hi-IN | मैं इस आवाज का मालिक हूं और मैं सिंथेटिक आवाज मॉडल बनाने के लिए Google को इस आवाज का उपयोग करने की सहमति देता हूं |
Indonesian (Indonesia) | id-ID | Saya pemilik suara ini dan saya menyetujui Google menggunakan suara ini untuk membuat model suara sintetis. |
Italian (Italy) | it-IT | Sono il proprietario di questa voce e acconsento che Google la utilizzi per creare un modello di voce sintetica. |
Kannada (India) | kn-IN | ನಾನು ಈ ಧ್ವನಿಯ ಮಾಲಿಕ ಮತ್ತು ಸಂಶ್ಲೇಷಿತ ಧ್ವನಿ ಮಾದರಿಯನ್ನು ರಚಿಸಲು ಈ ಧ್ವನಿಯನ್ನು ಬಳಸಿಕೊಂಡುಗೂಗಲ್ ಗೆ ನಾನು ಸಮ್ಮತಿಸುತ್ತೇನೆ. |
Korean (Korea) | ko-KR | 나는 이 음성의 소유자이며 구글이 이 음성을 사용하여 음성 합성 모델을 생성할 것을 허용합니다. |
Malayalam (India) | ml-IN | ഈ ശബ്ദത്തിന്റെ ഉടമ ഞാനാണ്, ഒരു സിന്തറ്റിക് വോയ്സ് മോഡൽ സൃഷ്ടിക്കാൻ ഈ ശബ്ദം ഉപയോഗിക്കുന്നതിന് ഞാൻ Google-ന് സമ്മതം നൽകുന്നു." |
Marathi (India) | mr-IN | मी या आवाजाचा मालक आहे आणि सिंथेटिक व्हॉइस मॉडेल तयार करण्यासाठी हा आवाज वापरण्यासाठी मी Google ला संमती देतो |
Dutch (Netherlands) | nl-NL | Ik ben de eigenaar van deze stem en ik geef Google toestemming om deze stem te gebruiken om een synthetisch stemmodel te maken. |
Polish (Poland) | pl-PL | Jestem właścicielem tego głosu i wyrażam zgodę na wykorzystanie go przez Google w celu utworzenia syntetycznego modelu głosu. |
Portuguese (Brazil) | pt-BR | Eu sou o proprietário desta voz e autorizo o Google a usá-la para criar um modelo de voz sintética. |
Russian (Russia) | ru-RU | Я являюсь владельцем этого голоса и даю согласие Google на использование этого голоса для создания модели синтетического голоса. |
Tamil (India) | ta-IN | நான் இந்த குரலின் உரிமையாளர் மற்றும் செயற்கை குரல் மாதிரியை உருவாக்க இந்த குரலை பயன்படுத்த குகல்க்கு நான் ஒப்புக்கொள்கிறேன். |
Telugu (India) | te-IN | నేను ఈ వాయిస్ యజమానిని మరియు సింతటిక్ వాయిస్ మోడల్ ని రూపొందించడానికి ఈ వాయిస్ ని ఉపయోగించడానికి googleకి నేను సమ్మతిస్తున్నాను. |
Thai (Thailand) | th-TH | ฉันเป็นเจ้าของเสียงนี้ และฉันยินยอมให้ Google ใช้เสียงนี้เพื่อสร้างแบบจำลองเสียงสังเคราะห์ |
Turkish (Turkey) | tr-TR | Bu sesin sahibi benim ve Google'ın bu sesi kullanarak sentetik bir ses modeli oluşturmasına izin veriyorum. |
Vietnamese (Vietnam) | vi-VN | Tôi là chủ sở hữu giọng nói này và tôi đồng ý cho Google sử dụng giọng nói này để tạo mô hình giọng nói tổng hợp. |
Regional Availability
Instant Custom Voice creation and synthesis is available in the following Google Cloud regions respectively:
Google Cloud Zone | Supported Method | Launch Readiness |
---|---|---|
global |
Creation, Synthesis | Private Preview |
us |
Synthesis | Private Preview |
eu |
Synthesis | Private Preview |
asia-southeast1 |
Synthesis | Private Preview |
Supported output formats
The default response format is LINEAR16, but other formats which are supported include:
API Method | Format |
---|---|
streaming |
ALAW, MULAW, OGG_OPUS and PCM |
batch |
ALAW, MULAW, MP3, OGG_OPUS and PCM |
Feature support and limitations
Feature | Support | Description |
---|---|---|
SSML | No | SSML tags to personalize synthetic audio |
Text-Based Prompting | Experimental | Use punctuation, pauses, and disfluency to add natural flow and pacing to Text-to-Speech. |
Timestamps | No | Word-level timestamps |
Pause Tags | No | Introduce on-demand pauses to synthesized audio |
Pace Control | No | Adjust the speed of synthesized audio, from 0.25x speed to 2x speed. |
Pronunciation Control | No | Custom pronunciations of words or phrases using IPA or X-SAMPA phonetic encoding |
Use Chirp 3: Instant Custom Voice
Let's explore how to use Chirp 3: Instant Custom Voice capabilities in Text-to-Speech API
Record Consent and Reference Audio
- Record the consent statement: To comply with legal and ethical guidelines for Instant Custom Voice, record the required consent statement as a mono WAV file, with LINEAR16 encoding and a 24 kHz sampling rate, in the appropriate language. (I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model.)
- Record reference audio: Use your computer microphone to record up to 10 seconds of audio as a LINEAR16-encoded, mono WAV file at a 24 kHz sampling rate. Ensure there is no background noise during the recording. Both the consent and reference audio must be recorded in the same environment.
- Store audio files: Save the recorded audio files in a designated Cloud Storage location.
Create an Instant Custom Voice
import requests, os, json
def create_instant_custom_voice_key(
access_token, project_id, reference_audio_bytes, consent_audio_bytes
):
url = "https://texttospeech.googleapis.com/v1beta1/voices:generateVoiceCloningKey"
request_body = {
"reference_audio": {
"audio_config": {"audio_encoding": "LINEAR16", "sample_rate_hertz": 24000},
"content": reference_audio_bytes,
},
"voice_talent_consent": {
"audio_config": {"audio_encoding": "LINEAR16", "sample_rate_hertz": 24000},
"content": consent_audio_bytes,
},
"consent_script": "I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model.",
"language_code": "en-US",
}
try:
headers = {
"Authorization": f"Bearer {access_token}",
"x-goog-user-project": project_id,
"Content-Type": "application/json; charset=utf-8",
}
response = requests.post(url, headers=headers, json=request_body)
response.raise_for_status()
response_json = response.json()
return response_json.get("voiceCloningKey")
except requests.exceptions.RequestException as e:
print(f"Error making API request: {e}")
except json.JSONDecodeError as e:
print(f"Error decoding JSON response: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
Synthesize with an Instant Custom Voice
import requests, os, json, base64
from IPython.display import Audio, display
def synthesize_text_with_cloned_voice(access_token, project_id, voice_key, text):
url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"
request_body = {
"input": {
"text": text
},
"voice": {
"language_code": "en-US",
"voice_clone": {
"voice_cloning_key": voice_key,
}
},
"audioConfig": {
"audioEncoding": "LINEAR16",
"sample_rate_hertz": 24000
}
}
try:
headers = {
"Authorization": f"Bearer {access_token}",
"x-goog-user-project": project_id,
"Content-Type": "application/json; charset=utf-8"
}
response = requests.post(url, headers=headers, json=request_body)
response.raise_for_status()
response_json = response.json()
audio_content = response_json.get("audioContent")
if audio_content:
display(Audio(base64.b64decode(audio_content), rate=24000))
else:
print("Error: Audio content not found in the response.")
print(response_json)
except requests.exceptions.RequestException as e:
print(f"Error making API request: {e}")
except json.JSONDecodeError as e:
print(f"Error decoding JSON response: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")