Speech-to-Text

Accurately convert speech into text with an API powered by the best of Google’s AI research and technology.

New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.

Try it for free Contact sales Go to console Contact sales

Transcribe your content with accurate captions
Enable the power of voice to create better user experiences
Improve your service with insights from customer interactions
Get started quickly with our in-console tutorial

Gartner names Google a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud AI Developer Services report.

Register to download the report

Benefits

State-of-the-art accuracy

Leverage Google’s most advanced deep learning neural network algorithms for automatic speech recognition (ASR).

Easy model customization

Experiment with, create, and manage custom resources with the Speech-to-Text UI.

Flexible model deployment

Deploy ASR wherever you need it, whether in the cloud with the API or on-premises with Speech-to-Text On-Prem.

Demo

Put Speech-to-Text into action

As in this demo, you can easily infuse speech transcription into your applications with the Speech-to-Text API.

Key features

Speech adaptation

Provide hints to boost the transcription accuracy of rare and domain-specific words or phrases. Use classes to automatically convert spoken numbers into addresses, years, currencies, and more.

Domain-specific models

Choose from a selection of trained models for voice control, phone call, and video transcription optimized for domain-specific quality requirements.

Easily compare quality

Experiment on your speech audio with our easy-to-use user interface. Try different configurations to optimize quality and accuracy.

Speech On-Device

Run Google Cloud's speech algorithms locally on any device, regardless of internet connectivity. Promise users that their voice data will never leave their device.

Foundation model for Speech-to-Text

Build voice-enabled applications for global audiences with speech models that are powered by Chirp, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences.

View all features

Two illustrated people with speech bubbles, one in cyrillic, one says 'Hi!'

BLOG

Unveiling a new user interface for Google Cloud’s STT API

Customers

Learn from customers building audio and voice solutions with Speech-to-Text

Case study

HSBC uses Speech-to-Text to train ML models to classify, extract, and detect customer sentiment.

5-min read

Case study

Josh Talks: Delivering glitchless online learning experiences with Firebase

5-min read

Case study

MRV uses Speech-to-Text and expands accessibility for its audience

5-min read

Case study

Istentia is using Speech-to-Text to deliver media intelligence products to help manage reputations.

5-min read

Case study

Castbox uses Speech-to-Text to deliver its in-audio search service for podcasts.

5-min read

Case study

Voximplant helps companies build voice solutions and boost the number of calls they can handle.

5-min read

See all customers

What's new

Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.

Video

How do I add AI into my apps? Watch video

Report

Customize the Speech-to-Text API to improve accuracy Learn more

Video

Combining AI APIs to work together Watch video

Blog post

The ultimate guide to Speech on Google Cloud Read the blog

Blog post

Google Cloud launches new models for more accurate Speech AI Read the blog

How companies are doing more with Speech AI

Blog post

Learn how Google Cloud customers are using Speech AI Read the blog

Documentation

Tutorials and getting started resources

Speech-to-Text in-console tutorials

Learn how to add Speech-to-Text to your existing apps. Get started quickly with our in-console tutorial.

Speech-to-Text basics

Learn the fundamental concepts in Speech-to-Text.

Quickstart: Using the Speech-to-Text-UI

Learn how to set up your project and authorization, create and refine a transcription, and learn how to use this configuration in the console.

Quickstart: Using the gcloud tool

Send an audio transcription request to Speech-to-Text using the gcloud tool from the command line.

Best practices

Review the best practices for transcribing audio with Speech-to-Text.

ML onramp

Explore Speech-to-Text tutorials, codelabs, and more.

Supported languages

Learn which languages are available for Speech-to-Text, plus the features and recognition models available for each.

Speech-to-Text On-Prem

Learn more about Speech-to-Text On-Prem, which enables easy integration of Google speech recognition technology into your on-premises solutions.

Use cases

Use case

Improve customer service

Empower your customer service system by adding IVR (interactive voice response) and agent conversations to your call centers. Perform analytics on your conversation data to gain more insights into the calls and your customers. Speech-to-Text and its enhanced phone call models are already powering Google Cloud’s powerful solution, Contact Center AI.

Workflow: Data moves from Contact Center Audio Data through Google Cloud products: Cloud Storage to (1) Transcribe with Speech-to-Text API to Natural Language API to (2) Analyze with Cloud Data Loss Prevention to (3) Redact PII with BigQuery. Then flow moves both directions from BigQuery (4) Store to (5) Query and visualize with Visualize Call Data.

Use case

Enable voice control

Implement voice commands such as “turn the volume up,” and voice search such as saying “what is the temperature in Paris?” Combine this with the Text-to-Speech API to deliver voice-enabled experiences in IoT (Internet of Things) applications.

Workflow of voice control using speech to text API. Flow starts with (1) User voice command to User device with two-way flow to (0) Unique secure identity with Cloud IoT Core, and two-way flow to Cloud Functions to (2) Transcribe with Speech-to-Text API to AutoML Natural Language with (3) Intent and entity extraction, back to Cloud Functions and User device.

Use case

Transcribe multimedia content

Transcribe your audio and video to include captions and improve your audience reach and experience. Add subtitles to your content real time to your streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses machine learning technology that is similar to video captioning on YouTube.

View all technical guides

All features

Global vocabulary	Support your global user base with Speech-to-Text’s extensive language support in over 125 languages and variants.
Streaming speech recognition	Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).
Speech adaptation	Customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boost your transcription accuracy of specific words or phrases. Automatically convert spoken numbers into addresses, years, currencies, and more using classes.
Speech-to-Text On-Prem	Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.
Multichannel recognition	Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference) and annotate the transcripts to preserve the order.
Noise robustness	Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.
Domain-specific models	Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
Content filtering	Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.
Transcription evaluation	Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on your configuration.
Automatic punctuation (beta)	Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).
Speaker diarization (beta)	Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.

Pricing

Speech-to-Text is priced based on the amount of audio successfully processed by the service each month, measured in increments of one second.

View pricing details

Speech-to-Text

State-of-the-art accuracy

Easy model customization

Flexible model deployment

Put Speech-to-Text into action

Key features

Speech adaptation

Domain-specific models

Easily compare quality

Speech On-Device

Foundation model for Speech-to-Text

Learn from customers building audio and voice solutions with Speech-to-Text

What's new

Tutorials and getting started resources

Speech-to-Text in-console tutorials

Speech-to-Text basics

Quickstart: Using the Speech-to-Text-UI

Quickstart: Using the gcloud tool

Best practices

ML onramp

Supported languages

Speech-to-Text On-Prem

Not seeing what you’re looking for?

Explore more docs

Use cases

All features

Pricing

Take the next step

Take the next step