Learn how you can import audio and transcript files with their metadata using the API. You can import a single file using the UploadConversation
API, or you can bulk import all the files from a Cloud Storage bucket using the IngestConversations
API.
The two request commands UploadConversation
and IngestConversations
support the following functions:
Request command | Number of files | Speech-to-Text | Redaction | Metadata ingestion | Automatic analysis |
---|---|---|---|---|---|
UploadConversation | 1 | ✔ | ✔ | ✔ metadata in the request | ✔ |
IngestConversations | All files in a bucket | ✔ | ✔ | ✔ metadata in the request |
Prerequisites
- Enable the Cloud Storage, Speech-to-Text, Cloud Data Loss Prevention, and Conversational Insights APIs on your Google Cloud project.
- Save your conversation data (dual-channel audio and transcript files) in a Cloud Storage bucket. Note the object path with the following format:
gs://<bucket>/<object>
- Give the Speech-to-Text and Conversational Insights service agents access to the objects in your Cloud Storage bucket. See this troubleshooting page for help with service accounts.
If you opt to import conversation metadata, ensure that metadata files are in their own bucket and the metadata filenames match their corresponding conversation filename.
For example, a conversation with the Cloud Storage URI
gs://transcript-bucket-name/conversation.mp3
, must have a corresponding metadata file such asgs://metadata-bucket-name/conversation.json
.
Conversation data
Conversation data consists of voice or chat transcripts and audio.
Transcripts
Chat transcripts must be supplied as JSON-formatted files that match the CCAI conversation data format.
Voice transcripts can be supplied in the CCAI conversation data format or as the returned speech recognition result of a Speech-to-Text API transcription. The response is identical for synchronous and asynchronous recognition across all Speech-to-Text API versions.
Audio
Conversational Insights uses Cloud Speech-to-Text batch recognition to transcribe audio. Insights configures Speech-to-Text transcription settings with Recognizer resources. You can create a custom recognizer in the request, or if you don't provide a recognizer either in Settings or in the request, Insights creates a default ccai-insights-recognizer
in your project.
The Insights recognizer transcribes English speech using the telephony model, and the default language is en-US. For a full list of Speech-to-Text support per region, language, model, and recognition feature, refer to the Speech-to-Text language support docs.
Before your first audio import to Insights, assess whether you would like to:
- Use a custom Speech-to-Text transcription configuration.
- Analyze the (optionally) redacted conversations.
You can configure these actions to run by default in each UploadConversation
or IngestConversation
request by setting the proper fields in the project Settings resource. The speech and redaction settings can also be overridden per-request. If you don't specify any speech settings, Insights will use the default speech settings and won't redact the transcripts.
Redaction
Cloud Data Loss Prevention does not redact transcripts unless you explicitly supply redaction configs in the project Settings, the UploadConversationRequest, or in the IngestConversationsRequest. Cloud Data Loss Prevention supports both inspection templates and de-identification templates for redaction.
Configure project settings
Redaction and speech can be configured for UploadConversation
and IngestConversations
requests by setting the corresponding project settings parameters. These configurations can also be set individually per request, which overrides the project settings. The analysis_percentage
configured in an analysis rule overrides the upload_conversation_analysis_percentage
configured through project settings.
Save the request body in a file called request.json
, and execute the following
command:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/settings?updateMask=redaction_config,speech_config,analysis_config.upload_conversation_analysis_percentage"
Metadata
Use metadata to perform a single-file or bulk import.
Import one file
For a single file import, include your quality metadata in the curl command for UploadConversationsRequest
.
curl --request POST \ 'https://contactcenterinsights.googleapis.com/v1/projects/project-id/locations/location-id/conversations:upload' \ --header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \ --header 'Accept: application/json' \ --header 'Content-Type: application/json' \ --data '{"conversation":{"qualityMetadata":{"agentInfo":[{"agentId":"agent-id","displayName":"agent-name"}]},"dataSource":{"gcsSource":{"transcriptUri":"gs://path/to_transcript"}}}}'
Do a bulk import
Supply conversation metadata files as JSON-formatted files in a bucket specified in the gcs_source.metadata_bucket_uri
field of the IngestConversationsRequest
. Insights populates conversation quality metadata found in the file, but you can also create custom metadata.
For example, to specify a custom conversation ID for each conversation in your dataset, specify custom metadata on the conversation object within Cloud Storage. Set the key to ccai_insights_conversation_id
. The value is your custom conversation ID. Custom conversation IDs can also be provided within the metadata file.
If you provide any custom metadata in the custom_metadata_keys
field of an IngestConversationsRequest
, Insights stores that custom metadata in the conversation labels. It supports up to 100 labels.
See the following example of a valid metadata file:
{ "customer_satisfaction_rating": 5, "agent_info": [ { "agent_id": "123456", "display_name": "Agent Name", "team": "Agent Team", "disposition_code": "resolved" } ], "custom_key": "custom value" "conversation_id": "custom-conversation-id" }
Import a single audio file
The UploadConversation
API creates a long-running operation that transcribes and optionally redacts your conversations. An audio file will be transcribed if the conversation contains only an audio_uri
in the DataSource
. Otherwise, the provided transcript_uri
will be read and used.
Request JSON body:
{ "conversation": { "data_source": { "gcs_source": { "audio_uri": AUDIO_URI } } }, "redaction_config": { "deidentify_template": DEIDENTIFY_TEMPLATE, "inspect_template": INSPECT_TEMPLATE }, "speech_config": { "speech_recognizer": RECOGNIZER_NAME } }
Save the request body in a file called request.json
, and execute the following
command:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/conversations:upload"
Bulk import
REST
Refer to the
conversations:ingest
API endpoint for complete details.
Before using any of the request data, make the following replacements:
- PROJECT_ID: your Google Cloud Platform project ID.
- GCS_BUCKET_URI: the Cloud Storage URI that points to the bucket containing the conversation transcripts. May contain a prefix. For example gs://BUCKET_NAME or gs://BUCKET_NAME/PREFIX. Wildcards are not supported.
- MEDIUM: set to either
PHONE_CALL
orCHAT
depending on the data type. If unspecified the default value isPHONE_CALL
. - AGENT_ID: Optional. Agent Id for the entire bucket.
HTTP method and URL:
POST https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/conversations:ingest
Request JSON body:
{ "gcsSource": { "bucketUri": "GCS_BUCKET_URI", "bucketObjectType": "AUDIO
" }, "transcriptObjectConfig": { "medium": "PHONE_CALL
" }, "conversationConfig": { "agentId": "AGENT_ID", "agentChannel": "AGENT_CHANNEL", "customerChannel": "CUSTOMER_CHANNEL" } } Or { "gcsSource": { "bucketUri": "GCS_BUCKET_URI", "bucketObjectType": "TRANSCRIPT
" }, "transcriptObjectConfig": { "medium": "MEDIUM" }, "conversationConfig": {"agentId": "AGENT_ID"} }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.contactcenterinsights.v1main.IngestConversationsMetadata", "createTime": "...", "request": { "parent": "projects/PROJECT_ID/locations/us-central1", "gcsSource": { "bucketUri": "GCS_BUCKET_URI", "bucketObjectType": "BUCKET_OBJECT_TYPE" }, "transcriptObjectConfig": { "medium": "MEDIUM" }, "conversationConfig": { "agentId": "AGENT_ID" } } } }
Poll the operation
Both the UploadConversation
and IngestConversation
requests return a long-running operation. Long-running methods are asynchronous, and the operation might not yet be completed when the method returns a response. You can poll the operation to check on its status. See the
long-running operations page
for details and code samples.
Speech-to-Text quotas
Conversational Insights uses two different Speech-to-Text APIs: BatchRecognize
and GetOperation
. Conversational Insights makes a BatchRecognize
request to start the Speech-to-Text transcription and a GetOperation
request to monitor whether or not the transcription is finished. To start BatchRecognize
operations, a BatchRecognize
request is made to use a per-minute, per-region quota. To monitor the operations, a GetOperation
request is made to use a per-minute, per-region quota.
For a single UploadConversation
call, Conversational Insights consumes one BatchRecognize
, but possibly more GetOperation
requests, depending on the duration of the task. For a bulk import, Conversational Insights consumes 100 requests of each type.