利用音訊輸入檔案偵測意圖

本指南說明如何使用 API,將音訊輸入內容傳送至偵測意圖要求。Dialogflow 會處理音訊並將其轉換為文字,接著再嘗試比對意圖。這項轉換作業稱為「音訊輸入」、「語音辨識」、「語音轉文字」或「STT」

事前準備

這項功能僅適用於用於使用者互動的 API。如果您使用的是整合,可以略過本指南。

閱讀本指南之前,請先完成下列工作:

  1. 詳閱 Dialogflow 基本概念
  2. 執行設定步驟

建立虛擬服務專員

如果您尚未建立服務帳戶,請立即建立:

  1. 前往 Dialogflow ES 主控台
  2. 按照系統要求登入 Dialogflow 主控台。詳情請參閱 Dialogflow 主控台總覽
  3. 按一下左側欄選單中的 [Create Agent] (建立代理程式)。(如果您已有其他代理程式,請按一下代理程式名稱然後捲動至底部,再按一下 [Create new agent] (建立新代理程式)。)
  4. 輸入代理程式的名稱、預設語言和預設時區。
  5. 如果您已建立專案,請輸入該項專案的資料。如要允許 Dialogflow 主控台建立專案,請選取 [Create a new Google project] (建立新 Google 專案)
  6. 按一下 [Create] (建立) 按鈕。

將範例檔案匯入代理程式

本指南中的步驟會假設您的代理程式符合某些條件,因此您需要匯入為本指南準備的代理程式。匯入時,這些步驟會使用「還原」選項,覆寫所有介面設定、意圖和實體。

如要匯入檔案,請按照下列步驟操作:

  1. 下載 room-booking-agent.zip 檔案。
  2. 前往 Dialogflow ES 主控台
  3. 選取所需的代理程式。
  4. 按一下代理程式名稱旁邊的「設定」 按鈕。
  5. 選取「Export and Import」分頁標籤。
  6. 選取「Restore From Zip」,然後按照操作說明還原下載的 ZIP 檔案。

偵測意圖

如要偵測意圖,請呼叫 Sessions 類型的 detectIntent 方法。

REST

下載 book-a-room.wav 範例輸入音訊檔案,其內容為「book a room」(預訂會議室)。這個範例音訊檔案必須採用 Base64 編碼,才能透過下方的 JSON 要求提供。以下是 Linux 範例:

wget https://cloud.google.com/dialogflow/es/docs/data/book-a-room.wav
base64 -w 0 book-a-room.wav > book-a-room.b64

如需其他平台的範例,請參閱 Cloud 語音轉文字 API 說明文件中的「Base64 編碼音訊內容」一節。

使用任何要求資料之前,請先替換以下項目:

  • PROJECT_ID:您的 Google Cloud 專案 ID
  • AUDIO:Base64 編碼音訊內容

HTTP 方法和網址:

POST https://dialogflow.googleapis.com/v2/projects/PROJECT_ID/agent/sessions/123456789:detectIntent

JSON 要求主體:

{
  "queryInput": {
    "audioConfig": {
      "languageCode": "en-US"
    }
  },
  "inputAudio": "AUDIO"
}

如要傳送要求,請展開以下其中一個選項:

您應該會收到如下的 JSON 回應:

{
  "responseId": "3c1e5a89-75b9-4c3f-b63d-4b1351dd5e32",
  "queryResult": {
    "queryText": "book a room",
    "action": "room.reservation",
    "parameters": {
      "time": "",
      "date": "",
      "guests": "",
      "duration": "",
      "location": ""
    },
    "fulfillmentText": "I can help with that. Where would you like to reserve a room?",
    "fulfillmentMessages": [
      {
        "text": {
          "text": [
            "I can help with that. Where would you like to reserve a room?"
          ]
        }
      }
    ],
    "intent": {
      "name": "projects/PROJECT_ID/agent/intents/e8f6a63e-73da-4a1a-8bfc-857183f71228",
      "displayName": "room.reservation"
    },
    "intentDetectionConfidence": 1,
    "diagnosticInfo": {},
    "languageCode": "en-us"
  }
}

請注意,queryResult.action 欄位的值為「room.reservation」,而 queryResult.fulfillmentMessages[0|1].text.text[0] 欄位的值會要求使用者提供更多資訊。

Go

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。

func DetectIntentAudio(projectID, sessionID, audioFile, languageCode string) (string, error) {
	ctx := context.Background()

	sessionClient, err := dialogflow.NewSessionsClient(ctx)
	if err != nil {
		return "", err
	}
	defer sessionClient.Close()

	if projectID == "" || sessionID == "" {
		return "", fmt.Errorf("detect.DetectIntentAudio empty project (%s) or session (%s)", projectID, sessionID)
	}

	sessionPath := fmt.Sprintf("projects/%s/agent/sessions/%s", projectID, sessionID)

	// In this example, we hard code the encoding and sample rate for simplicity.
	audioConfig := dialogflowpb.InputAudioConfig{AudioEncoding: dialogflowpb.AudioEncoding_AUDIO_ENCODING_LINEAR_16, SampleRateHertz: 16000, LanguageCode: languageCode}

	queryAudioInput := dialogflowpb.QueryInput_AudioConfig{AudioConfig: &audioConfig}

	audioBytes, err := os.ReadFile(audioFile)
	if err != nil {
		return "", err
	}

	queryInput := dialogflowpb.QueryInput{Input: &queryAudioInput}
	request := dialogflowpb.DetectIntentRequest{Session: sessionPath, QueryInput: &queryInput, InputAudio: audioBytes}

	response, err := sessionClient.DetectIntent(ctx, &request)
	if err != nil {
		return "", err
	}

	queryResult := response.GetQueryResult()
	fulfillmentText := queryResult.GetFulfillmentText()
	return fulfillmentText, nil
}

Java

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。


import com.google.api.gax.rpc.ApiException;
import com.google.cloud.dialogflow.v2.AudioEncoding;
import com.google.cloud.dialogflow.v2.DetectIntentRequest;
import com.google.cloud.dialogflow.v2.DetectIntentResponse;
import com.google.cloud.dialogflow.v2.InputAudioConfig;
import com.google.cloud.dialogflow.v2.QueryInput;
import com.google.cloud.dialogflow.v2.QueryResult;
import com.google.cloud.dialogflow.v2.SessionName;
import com.google.cloud.dialogflow.v2.SessionsClient;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class DetectIntentAudio {

  // DialogFlow API Detect Intent sample with audio files.
  public static QueryResult detectIntentAudio(
      String projectId, String audioFilePath, String sessionId, String languageCode)
      throws IOException, ApiException {
    // Instantiates a client
    try (SessionsClient sessionsClient = SessionsClient.create()) {
      // Set the session name using the sessionId (UUID) and projectID (my-project-id)
      SessionName session = SessionName.of(projectId, sessionId);
      System.out.println("Session Path: " + session.toString());

      // Note: hard coding audioEncoding and sampleRateHertz for simplicity.
      // Audio encoding of the audio content sent in the query request.
      AudioEncoding audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16;
      int sampleRateHertz = 16000;

      // Instructs the speech recognizer how to process the audio content.
      InputAudioConfig inputAudioConfig =
          InputAudioConfig.newBuilder()
              .setAudioEncoding(
                  audioEncoding) // audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16
              .setLanguageCode(languageCode) // languageCode = "en-US"
              .setSampleRateHertz(sampleRateHertz) // sampleRateHertz = 16000
              .build();

      // Build the query with the InputAudioConfig
      QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();

      // Read the bytes from the audio file
      byte[] inputAudio = Files.readAllBytes(Paths.get(audioFilePath));

      // Build the DetectIntentRequest
      DetectIntentRequest request =
          DetectIntentRequest.newBuilder()
              .setSession(session.toString())
              .setQueryInput(queryInput)
              .setInputAudio(ByteString.copyFrom(inputAudio))
              .build();

      // Performs the detect intent request
      DetectIntentResponse response = sessionsClient.detectIntent(request);

      // Display the query result
      QueryResult queryResult = response.getQueryResult();
      System.out.println("====================");
      System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
      System.out.format(
          "Detected Intent: %s (confidence: %f)\n",
          queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
      System.out.format(
          "Fulfillment Text: '%s'\n",
          queryResult.getFulfillmentMessagesCount() > 0
              ? queryResult.getFulfillmentMessages(0).getText()
              : "Triggered Default Fallback Intent");

      return queryResult;
    }
  }
}

Node.js

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。

const fs = require('fs');
const util = require('util');
const {struct} = require('pb-util');
// Imports the Dialogflow library
const dialogflow = require('@google-cloud/dialogflow');

// Instantiates a session client
const sessionClient = new dialogflow.SessionsClient();

// The path to identify the agent that owns the created intent.
const sessionPath = sessionClient.projectAgentSessionPath(
  projectId,
  sessionId
);

// Read the content of the audio file and send it as part of the request.
const readFile = util.promisify(fs.readFile);
const inputAudio = await readFile(filename);
const request = {
  session: sessionPath,
  queryInput: {
    audioConfig: {
      audioEncoding: encoding,
      sampleRateHertz: sampleRateHertz,
      languageCode: languageCode,
    },
  },
  inputAudio: inputAudio,
};

// Recognizes the speech in the audio and detects its intent.
const [response] = await sessionClient.detectIntent(request);

console.log('Detected intent:');
const result = response.queryResult;
// Instantiates a context client
const contextClient = new dialogflow.ContextsClient();

console.log(`  Query: ${result.queryText}`);
console.log(`  Response: ${result.fulfillmentText}`);
if (result.intent) {
  console.log(`  Intent: ${result.intent.displayName}`);
} else {
  console.log('  No intent matched.');
}
const parameters = JSON.stringify(struct.decode(result.parameters));
console.log(`  Parameters: ${parameters}`);
if (result.outputContexts && result.outputContexts.length) {
  console.log('  Output contexts:');
  result.outputContexts.forEach(context => {
    const contextId =
      contextClient.matchContextFromProjectAgentSessionContextName(
        context.name
      );
    const contextParameters = JSON.stringify(
      struct.decode(context.parameters)
    );
    console.log(`    ${contextId}`);
    console.log(`      lifespan: ${context.lifespanCount}`);
    console.log(`      parameters: ${contextParameters}`);
  });
}

Python

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。

def detect_intent_audio(project_id, session_id, audio_file_path, language_code):
    """Returns the result of detect intent with an audio file as input.

    Using the same `session_id` between requests allows continuation
    of the conversation."""
    from google.cloud import dialogflow

    session_client = dialogflow.SessionsClient()

    # Note: hard coding audio_encoding and sample_rate_hertz for simplicity.
    audio_encoding = dialogflow.AudioEncoding.AUDIO_ENCODING_LINEAR_16
    sample_rate_hertz = 16000

    session = session_client.session_path(project_id, session_id)
    print("Session path: {}\n".format(session))

    with open(audio_file_path, "rb") as audio_file:
        input_audio = audio_file.read()

    audio_config = dialogflow.InputAudioConfig(
        audio_encoding=audio_encoding,
        language_code=language_code,
        sample_rate_hertz=sample_rate_hertz,
    )
    query_input = dialogflow.QueryInput(audio_config=audio_config)

    request = dialogflow.DetectIntentRequest(
        session=session,
        query_input=query_input,
        input_audio=input_audio,
    )
    response = session_client.detect_intent(request=request)

    print("=" * 20)
    print("Query text: {}".format(response.query_result.query_text))
    print(
        "Detected intent: {} (confidence: {})\n".format(
            response.query_result.intent.display_name,
            response.query_result.intent_detection_confidence,
        )
    )
    print("Fulfillment text: {}\n".format(response.query_result.fulfillment_text))

其他語言

C#:請按照用戶端程式庫頁面上的C# 設定說明操作,然後參閱 .NET 適用的 Dialogflow 參考說明文件

PHP:請按照用戶端程式庫頁面上的 PHP 設定操作說明操作,然後參閱 PHP 適用的 Dialogflow 參考文件

Ruby:請按照用戶端程式庫頁面上的 Ruby 設定說明操作,然後參閱 Ruby 適用的 Dialogflow 參考文件