本頁面由 Cloud Translation API 翻譯而成。

利用音訊輸出偵測意圖

應用程式通常需要透過機器人回覆使用者。Dialogflow 可以透過採用 DeepMind WaveNet 技術的 Cloud Text-to-Speech 產生代理程式語音回應。從意圖文字回應變為音訊內容的這類轉換作業稱為「音訊輸出」、「語音合成」、「文字轉語音」或「TTS」。

這份指南提供在偵測意圖時使用音訊輸入和輸出內容的範例。如果您開發的應用程式是透過純音訊介面與使用者互動，這種使用情形就相當常見。

如需系統支援的語言清單，請查看「Languages」(語言) 頁面中的 TTS 資料欄。

事前準備

這項功能僅適用於用於使用者互動的 API。如果您使用的是整合，可以略過本指南。

閱讀本指南之前，請先完成下列工作：

詳閱 Dialogflow 基本概念。
執行設定步驟。

建立虛擬服務專員

如果您尚未建立服務帳戶，請立即建立：

前往 Dialogflow ES 主控台。
按照系統要求登入 Dialogflow 主控台。詳情請參閱 Dialogflow 主控台總覽。
按一下左側欄選單中的 [Create Agent] (建立代理程式)。(如果您已有其他代理程式，請按一下代理程式名稱然後捲動至底部，再按一下 [Create new agent] (建立新代理程式)。)
輸入代理程式的名稱、預設語言和預設時區。
如果您已建立專案，請輸入該項專案的資料。如要允許 Dialogflow 主控台建立專案，請選取 [Create a new Google project] (建立新 Google 專案)。
按一下 [Create] (建立) 按鈕。

將範例檔案匯入代理程式

本指南中的步驟會假設您的代理程式符合某些條件，因此您需要匯入為本指南準備的代理程式。匯入時，這些步驟會使用「還原」選項，覆寫所有介面設定、意圖和實體。

如要匯入檔案，請按照下列步驟操作：

下載 room-booking-agent.zip 檔案。
前往 Dialogflow ES 主控台。
選取所需的代理程式。
按一下代理程式名稱旁邊的「設定」按鈕。
選取「Export and Import」分頁標籤。
選取「Restore From Zip」，然後按照操作說明還原下載的 ZIP 檔案。

偵測意圖

如要偵測意圖，請呼叫 Sessions 類型的 detectIntent 方法。

REST

1. 準備音訊內容

下載 book-a-room.wav 範例輸入音訊檔案，其內容為「book a room」(預訂會議室)。這個範例音訊檔案必須採用 Base64 編碼，才能透過下方的 JSON 要求提供。以下是 Linux 範例：

wget https://cloud.google.com/dialogflow/es/docs/data/book-a-room.wav
base64 -w 0 book-a-room.wav > book-a-room.b64

如需其他平台的範例，請參閱 Cloud Speech API 說明文件中的嵌入 Base64 編碼音訊一節。

2. 發出偵測意圖要求

透過 Sessions 類型呼叫 detectIntent 方法，並指定採用 Base64 編碼的音訊。

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的 Google Cloud 專案 ID
SESSION_ID：工作階段 ID
BASE64_AUDIO：上方輸出檔案的 Base64 編碼內容

HTTP 方法和網址：

POST https://dialogflow.googleapis.com/v2/projects/PROJECT_ID/agent/sessions/SESSION_ID:detectIntent

JSON 要求主體：

{
  "queryInput": {
    "audioConfig": {
      "languageCode": "en-US"
    }
  },
  "outputAudioConfig" : {
    "audioEncoding": "OUTPUT_AUDIO_ENCODING_LINEAR_16"
  },
  "inputAudio": "BASE64_AUDIO"
}

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

注意：以下指令假設您已使用使用者帳戶登入 gcloud CLI，方法是執行 gcloud init 或 gcloud auth login，或是使用 Cloud Shell，後者會自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://dialogflow.googleapis.com/v2/projects/PROJECT_ID/agent/sessions/SESSION_ID:detectIntent"

PowerShell (Windows)

注意：下列指令假設您已透過執行 gcloud init 或 gcloud auth login 登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://dialogflow.googleapis.com/v2/projects/PROJECT_ID/agent/sessions/SESSION_ID:detectIntent" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

{
  "responseId": "b7405848-2a3a-4e26-b9c6-c4cf9c9a22ee",
  "queryResult": {
    "queryText": "book a room",
    "speechRecognitionConfidence": 0.8616504,
    "action": "room.reservation",
    "parameters": {
      "time": "",
      "date": "",
      "duration": "",
      "guests": "",
      "location": ""
    },
    "fulfillmentText": "I can help with that. Where would you like to reserve a room?",
    "fulfillmentMessages": [
      {
        "text": {
          "text": [
            "I can help with that. Where would you like to reserve a room?"
          ]
        }
      }
    ],
    "intent": {
      "name": "projects/PROJECT_ID/agent/intents/e8f6a63e-73da-4a1a-8bfc-857183f71228",
      "displayName": "room.reservation"
    },
    "intentDetectionConfidence": 1,
    "diagnosticInfo": {},
    "languageCode": "en-us"
  },
  "outputAudio": "UklGRs6vAgBXQVZFZm10IBAAAAABAAEAwF0AAIC7AA..."
}

請注意，queryResult.action 欄位的值為 room.reservation，outputAudio 欄位中則包含較大的 Base64 音訊字串。

3. 播放輸出音訊

複製 outputAudio 欄位中的文字，並儲存在名為「output_audio.b64」的檔案中。您必須將這個檔案轉換為音訊檔案。以下是 Linux 範例：

base64 -d output_audio.b64 > output_audio.wav

如需其他平台的範例，請參閱 Text-to-speech API 說明文件中的解碼 Base64 編碼的音訊內容一文。

您現在可以播放 output_audio.wav 音訊檔案，聽到的內容會與上方 queryResult.fulfillmentMessages[1].text.text[0] 欄位中的文字相符。系統會選取第二個 fulfillmentMessages 元素，因為該元素是預設平台的文字回應。

Java

如要向 Dialogflow 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。


import com.google.api.gax.rpc.ApiException;
import com.google.cloud.dialogflow.v2.DetectIntentRequest;
import com.google.cloud.dialogflow.v2.DetectIntentResponse;
import com.google.cloud.dialogflow.v2.OutputAudioConfig;
import com.google.cloud.dialogflow.v2.OutputAudioEncoding;
import com.google.cloud.dialogflow.v2.QueryInput;
import com.google.cloud.dialogflow.v2.QueryResult;
import com.google.cloud.dialogflow.v2.SessionName;
import com.google.cloud.dialogflow.v2.SessionsClient;
import com.google.cloud.dialogflow.v2.TextInput;
import com.google.common.collect.Maps;
import java.io.IOException;
import java.util.List;
import java.util.Map;

public class DetectIntentWithTextToSpeechResponse {

  public static Map<String, QueryResult> detectIntentWithTexttoSpeech(
      String projectId, List<String> texts, String sessionId, String languageCode)
      throws IOException, ApiException {
    Map<String, QueryResult> queryResults = Maps.newHashMap();
    // Instantiates a client
    try (SessionsClient sessionsClient = SessionsClient.create()) {
      // Set the session name using the sessionId (UUID) and projectID (my-project-id)
      SessionName session = SessionName.of(projectId, sessionId);
      System.out.println("Session Path: " + session.toString());

      // Detect intents for each text input
      for (String text : texts) {
        // Set the text (hello) and language code (en-US) for the query
        TextInput.Builder textInput =
            TextInput.newBuilder().setText(text).setLanguageCode(languageCode);

        // Build the query with the TextInput
        QueryInput queryInput = QueryInput.newBuilder().setText(textInput).build();

        //
        OutputAudioEncoding audioEncoding = OutputAudioEncoding.OUTPUT_AUDIO_ENCODING_LINEAR_16;
        int sampleRateHertz = 16000;
        OutputAudioConfig outputAudioConfig =
            OutputAudioConfig.newBuilder()
                .setAudioEncoding(audioEncoding)
                .setSampleRateHertz(sampleRateHertz)
                .build();

        DetectIntentRequest dr =
            DetectIntentRequest.newBuilder()
                .setQueryInput(queryInput)
                .setOutputAudioConfig(outputAudioConfig)
                .setSession(session.toString())
                .build();

        // Performs the detect intent request
        DetectIntentResponse response = sessionsClient.detectIntent(dr);

        // Display the query result
        QueryResult queryResult = response.getQueryResult();

        System.out.println("====================");
        System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
        System.out.format(
            "Detected Intent: %s (confidence: %f)\n",
            queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
        System.out.format(
            "Fulfillment Text: '%s'\n",
            queryResult.getFulfillmentMessagesCount() > 0
                ? queryResult.getFulfillmentMessages(0).getText()
                : "Triggered Default Fallback Intent");

        queryResults.put(text, queryResult);
      }
    }
    return queryResults;
  }
}

Node.js

如要向 Dialogflow 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

// Imports the Dialogflow client library
const dialogflow = require('@google-cloud/dialogflow').v2;

// Instantiate a DialogFlow client.
const sessionClient = new dialogflow.SessionsClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'ID of GCP project associated with your Dialogflow agent';
// const sessionId = `user specific ID of session, e.g. 12345`;
// const query = `phrase(s) to pass to detect, e.g. I'd like to reserve a room for six people`;
// const languageCode = 'BCP-47 language code, e.g. en-US';
// const outputFile = `path for audio output file, e.g. ./resources/myOutput.wav`;

// Define session path
const sessionPath = sessionClient.projectAgentSessionPath(
  projectId,
  sessionId
);
const fs = require('fs');
const util = require('util');

async function detectIntentwithTTSResponse() {
  // The audio query request
  const request = {
    session: sessionPath,
    queryInput: {
      text: {
        text: query,
        languageCode: languageCode,
      },
    },
    outputAudioConfig: {
      audioEncoding: 'OUTPUT_AUDIO_ENCODING_LINEAR_16',
    },
  };
  sessionClient.detectIntent(request).then(responses => {
    console.log('Detected intent:');
    const audioFile = responses[0].outputAudio;
    util.promisify(fs.writeFile)(outputFile, audioFile, 'binary');
    console.log(`Audio content written to file: ${outputFile}`);
  });
}
detectIntentwithTTSResponse();

Python

如要向 Dialogflow 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

def detect_intent_with_texttospeech_response(
    project_id, session_id, texts, language_code
):
    """Returns the result of detect intent with texts as inputs and includes
    the response in an audio format.

    Using the same `session_id` between requests allows continuation
    of the conversation."""
    from google.cloud import dialogflow

    session_client = dialogflow.SessionsClient()

    session_path = session_client.session_path(project_id, session_id)
    print("Session path: {}\n".format(session_path))

    for text in texts:
        text_input = dialogflow.TextInput(text=text, language_code=language_code)

        query_input = dialogflow.QueryInput(text=text_input)

        # Set the query parameters with sentiment analysis
        output_audio_config = dialogflow.OutputAudioConfig(
            audio_encoding=dialogflow.OutputAudioEncoding.OUTPUT_AUDIO_ENCODING_LINEAR_16
        )

        request = dialogflow.DetectIntentRequest(
            session=session_path,
            query_input=query_input,
            output_audio_config=output_audio_config,
        )
        response = session_client.detect_intent(request=request)

        print("=" * 20)
        print("Query text: {}".format(response.query_result.query_text))
        print(
            "Detected intent: {} (confidence: {})\n".format(
                response.query_result.intent.display_name,
                response.query_result.intent_detection_confidence,
            )
        )
        print("Fulfillment text: {}\n".format(response.query_result.fulfillment_text))
        # The response's audio_content is binary.
        with open("output.wav", "wb") as out:
            out.write(response.output_audio)
            print('Audio content written to file "output.wav"')

如需相關回應欄位的說明，請參閱偵測意圖回應一節。

偵測意圖回應

偵測意圖要求的回應屬於 DetectIntentResponse 類型。

一般的偵測意圖處理程序會控管 DetectIntentResponse.queryResult.fulfillmentMessages 欄位的內容。

系統會依據 DetectIntentResponse.queryResult.fulfillmentMessages 欄位中的「預設」平台「文字」回應值，在 DetectIntentResponse.outputAudio 欄位中填入音訊。如有多則預設文字回應，系統會在產生音訊時將其串聯。如果沒有任何預設的平台文字回應，則系統會產生空白的音訊內容。

系統會依據用來產生輸出音訊內容的音訊設定填入 DetectIntentResponse.outputAudioConfig 欄位。

利用串流偵測意圖

利用串流偵測意圖時，您必須傳送與不使用輸出音訊的範例類似的要求。詳情請參閱利用串流偵測意圖一文。不過，您必須為要求提供 OutputAudioConfig 欄位。在您從 Dialogflow API 伺服器取得的最後一則串流回應中，output_audio 和 output_audio_config 欄位會填入相關值。詳情請參閱 StreamingDetectIntentRequest 和 StreamingDetectIntentResponse。

語音相關代理程式設定

您可以控管語音合成作業的各項程序。請參閱代理程式語音設定說明。

使用 Dialogflow 模擬器

您可以透過 Dialogflow 模擬器與代理程式互動及接收音訊回應，方法如下：

按照上述步驟啟用自動文字轉語音功能。
在模擬工具中輸入或說出「book a room」。
查看模擬工具底部的「output audio」(輸出音訊) 專區。

串流輸入音訊來偵測意圖

利用情緒分析偵測意圖

利用音訊輸出偵測意圖 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

事前準備

建立虛擬服務專員

將範例檔案匯入代理程式

偵測意圖

REST

1. 準備音訊內容

2. 發出偵測意圖要求

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

3. 播放輸出音訊

Java

Node.js

Python

偵測意圖回應

利用串流偵測意圖

語音相關代理程式設定

使用 Dialogflow 模擬器

利用音訊輸出偵測意圖