串流輸入音訊來偵測意圖

這個頁面說明如何使用 API 串流音訊輸入內容來偵測意圖要求。Dialogflow 會處理音訊並將其轉換為文字,接著再嘗試比對意圖。這項轉換作業稱為「音訊輸入」、「語音辨識」、「語音轉文字」或「STT」

事前準備

這項功能僅適用於用於使用者互動的 API。如果您使用的是整合,可以略過本指南。

閱讀本指南之前,請先完成下列工作:

  1. 詳閱 Dialogflow 基本概念
  2. 執行設定步驟

建立虛擬服務專員

如果您尚未建立服務帳戶,請立即建立:

  1. 前往 Dialogflow ES 主控台
  2. 按照系統要求登入 Dialogflow 主控台。詳情請參閱 Dialogflow 主控台總覽
  3. 按一下左側欄選單中的 [Create Agent] (建立代理程式)。(如果您已有其他代理程式,請按一下代理程式名稱然後捲動至底部,再按一下 [Create new agent] (建立新代理程式)。)
  4. 輸入代理程式的名稱、預設語言和預設時區。
  5. 如果您已建立專案,請輸入該項專案的資料。如要允許 Dialogflow 主控台建立專案,請選取 [Create a new Google project] (建立新 Google 專案)
  6. 按一下 [Create] (建立) 按鈕。

將範例檔案匯入代理程式

本指南中的步驟會假設您的代理程式符合某些條件,因此您需要匯入為本指南準備的代理程式。匯入時,這些步驟會使用「還原」選項,覆寫所有介面設定、意圖和實體。

如要匯入檔案,請按照下列步驟操作:

  1. 下載 room-booking-agent.zip 檔案。
  2. 前往 Dialogflow ES 主控台
  3. 選取所需的代理程式。
  4. 按一下代理程式名稱旁邊的「設定」 按鈕。
  5. 選取「Export and Import」分頁標籤。
  6. 選取「Restore From Zip」,然後按照操作說明還原下載的 ZIP 檔案。

串流基本概念

Session 類型的 streamingDetectIntent 方法會傳回雙向的 gRPC 串流物件。適用於這個物件的方法因程式語言而異,詳情請參閱用戶端程式庫的參考說明文件。

串流物件用於並行收發資料。使用這個物件時,您的用戶端會將音訊內容串流至 Dialogflow,同時監聽 StreamingDetectIntentResponse

streamingDetectIntent 方法含有會影響語音辨識作業的 query_input.audio_config.single_utterance 參數:

  • 如為 false (預設值),系統在用戶端關閉串流前都不會停止執行語音辨識作業。
  • 如為 true,Dialogflow 則會在輸入音訊中偵測單一語音內容。Dialogflow 偵測到音訊語音已停止或暫停時,系統會停止語音辨識作業,並將辨識結果為 END_OF_SINGLE_UTTERANCEStreamingDetectIntentResponse 傳送至您的用戶端。在收到 END_OF_SINGLE_UTTERANCE 之後,Dialogflow 會忽略透過串流傳送至 Dialogflow 的所有音訊。

在雙向串流中,用戶端可以「半關閉」串流物件,藉此向伺服器發出不會傳送更多資料的信號。舉例來說,這個方法在 Java 和 Go 中稱為 closeSend。在下列情境中,半關閉 (並非取消) 串流會產生相當大的影響:

  • 您的用戶端已完成資料傳送作業。
  • 在您的用戶端設定中,single_utterance 已設為 true 並收到辨識結果為 END_OF_SINGLE_UTTERANCEStreamingDetectIntentResponse

關閉串流之後,您的用戶端應視需求以新串流發出新的要求。

串流偵測意圖

以下範例使用 Session 類型的 streamingDetectIntent 方法來串流音訊內容。

Go

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。

func DetectIntentStream(projectID, sessionID, audioFile, languageCode string) (string, error) {
	ctx := context.Background()

	sessionClient, err := dialogflow.NewSessionsClient(ctx)
	if err != nil {
		return "", err
	}
	defer sessionClient.Close()

	if projectID == "" || sessionID == "" {
		return "", fmt.Errorf("detect.DetectIntentStream empty project (%s) or session (%s)", projectID, sessionID)
	}

	sessionPath := fmt.Sprintf("projects/%s/agent/sessions/%s", projectID, sessionID)

	// In this example, we hard code the encoding and sample rate for simplicity.
	audioConfig := dialogflowpb.InputAudioConfig{AudioEncoding: dialogflowpb.AudioEncoding_AUDIO_ENCODING_LINEAR_16, SampleRateHertz: 16000, LanguageCode: languageCode}

	queryAudioInput := dialogflowpb.QueryInput_AudioConfig{AudioConfig: &audioConfig}

	queryInput := dialogflowpb.QueryInput{Input: &queryAudioInput}

	streamer, err := sessionClient.StreamingDetectIntent(ctx)
	if err != nil {
		return "", err
	}

	f, err := os.Open(audioFile)
	if err != nil {
		return "", err
	}

	defer f.Close()

	go func() {
		audioBytes := make([]byte, 1024)

		request := dialogflowpb.StreamingDetectIntentRequest{Session: sessionPath, QueryInput: &queryInput}
		err = streamer.Send(&request)
		if err != nil {
			log.Fatal(err)
		}

		for {
			_, err := f.Read(audioBytes)
			if err == io.EOF {
				streamer.CloseSend()
				break
			}
			if err != nil {
				log.Fatal(err)
			}

			request = dialogflowpb.StreamingDetectIntentRequest{InputAudio: audioBytes}
			err = streamer.Send(&request)
			if err != nil {
				log.Fatal(err)
			}
		}
	}()

	var queryResult *dialogflowpb.QueryResult

	for {
		response, err := streamer.Recv()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}

		recognitionResult := response.GetRecognitionResult()
		transcript := recognitionResult.GetTranscript()
		log.Printf("Recognition transcript: %s\n", transcript)

		queryResult = response.GetQueryResult()
	}

	fulfillmentText := queryResult.GetFulfillmentText()
	return fulfillmentText, nil
}

Java

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。


import com.google.api.gax.rpc.ApiException;
import com.google.api.gax.rpc.BidiStream;
import com.google.cloud.dialogflow.v2.AudioEncoding;
import com.google.cloud.dialogflow.v2.InputAudioConfig;
import com.google.cloud.dialogflow.v2.QueryInput;
import com.google.cloud.dialogflow.v2.QueryResult;
import com.google.cloud.dialogflow.v2.SessionName;
import com.google.cloud.dialogflow.v2.SessionsClient;
import com.google.cloud.dialogflow.v2.StreamingDetectIntentRequest;
import com.google.cloud.dialogflow.v2.StreamingDetectIntentResponse;
import com.google.protobuf.ByteString;
import java.io.FileInputStream;
import java.io.IOException;

class DetectIntentStream {

  // DialogFlow API Detect Intent sample with audio files processes as an audio stream.
  static void detectIntentStream(String projectId, String audioFilePath, String sessionId)
      throws IOException, ApiException {
    // String projectId = "YOUR_PROJECT_ID";
    // String audioFilePath = "path_to_your_audio_file";
    // Using the same `sessionId` between requests allows continuation of the conversation.
    // String sessionId = "Identifier of the DetectIntent session";

    // Instantiates a client
    try (SessionsClient sessionsClient = SessionsClient.create()) {
      // Set the session name using the sessionId (UUID) and projectID (my-project-id)
      SessionName session = SessionName.of(projectId, sessionId);

      // Instructs the speech recognizer how to process the audio content.
      // Note: hard coding audioEncoding and sampleRateHertz for simplicity.
      // Audio encoding of the audio content sent in the query request.
      InputAudioConfig inputAudioConfig =
          InputAudioConfig.newBuilder()
              .setAudioEncoding(AudioEncoding.AUDIO_ENCODING_LINEAR_16)
              .setLanguageCode("en-US") // languageCode = "en-US"
              .setSampleRateHertz(16000) // sampleRateHertz = 16000
              .build();

      // Build the query with the InputAudioConfig
      QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();

      // Create the Bidirectional stream
      BidiStream<StreamingDetectIntentRequest, StreamingDetectIntentResponse> bidiStream =
          sessionsClient.streamingDetectIntentCallable().call();

      // The first request must **only** contain the audio configuration:
      bidiStream.send(
          StreamingDetectIntentRequest.newBuilder()
              .setSession(session.toString())
              .setQueryInput(queryInput)
              .build());

      try (FileInputStream audioStream = new FileInputStream(audioFilePath)) {
        // Subsequent requests must **only** contain the audio data.
        // Following messages: audio chunks. We just read the file in fixed-size chunks. In reality
        // you would split the user input by time.
        byte[] buffer = new byte[4096];
        int bytes;
        while ((bytes = audioStream.read(buffer)) != -1) {
          bidiStream.send(
              StreamingDetectIntentRequest.newBuilder()
                  .setInputAudio(ByteString.copyFrom(buffer, 0, bytes))
                  .build());
        }
      }

      // Tell the service you are done sending data
      bidiStream.closeSend();

      for (StreamingDetectIntentResponse response : bidiStream) {
        QueryResult queryResult = response.getQueryResult();
        System.out.println("====================");
        System.out.format("Intent Display Name: %s\n", queryResult.getIntent().getDisplayName());
        System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
        System.out.format(
            "Detected Intent: %s (confidence: %f)\n",
            queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
        System.out.format(
            "Fulfillment Text: '%s'\n",
            queryResult.getFulfillmentMessagesCount() > 0
                ? queryResult.getFulfillmentMessages(0).getText()
                : "Triggered Default Fallback Intent");
      }
    }
  }
}

Node.js

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。

const fs = require('fs');
const util = require('util');
const {Transform, pipeline} = require('stream');
const {struct} = require('pb-util');

const pump = util.promisify(pipeline);
// Imports the Dialogflow library
const dialogflow = require('@google-cloud/dialogflow');

// Instantiates a session client
const sessionClient = new dialogflow.SessionsClient();

// The path to the local file on which to perform speech recognition, e.g.
// /path/to/audio.raw const filename = '/path/to/audio.raw';

// The encoding of the audio file, e.g. 'AUDIO_ENCODING_LINEAR_16'
// const encoding = 'AUDIO_ENCODING_LINEAR_16';

// The sample rate of the audio file in hertz, e.g. 16000
// const sampleRateHertz = 16000;

// The BCP-47 language code to use, e.g. 'en-US'
// const languageCode = 'en-US';
const sessionPath = sessionClient.projectAgentSessionPath(
  projectId,
  sessionId
);

const initialStreamRequest = {
  session: sessionPath,
  queryInput: {
    audioConfig: {
      audioEncoding: encoding,
      sampleRateHertz: sampleRateHertz,
      languageCode: languageCode,
    },
  },
};

// Create a stream for the streaming request.
const detectStream = sessionClient
  .streamingDetectIntent()
  .on('error', console.error)
  .on('data', data => {
    if (data.recognitionResult) {
      console.log(
        `Intermediate transcript: ${data.recognitionResult.transcript}`
      );
    } else {
      console.log('Detected intent:');

      const result = data.queryResult;
      // Instantiates a context client
      const contextClient = new dialogflow.ContextsClient();

      console.log(`  Query: ${result.queryText}`);
      console.log(`  Response: ${result.fulfillmentText}`);
      if (result.intent) {
        console.log(`  Intent: ${result.intent.displayName}`);
      } else {
        console.log('  No intent matched.');
      }
      const parameters = JSON.stringify(struct.decode(result.parameters));
      console.log(`  Parameters: ${parameters}`);
      if (result.outputContexts && result.outputContexts.length) {
        console.log('  Output contexts:');
        result.outputContexts.forEach(context => {
          const contextId =
            contextClient.matchContextFromProjectAgentSessionContextName(
              context.name
            );
          const contextParameters = JSON.stringify(
            struct.decode(context.parameters)
          );
          console.log(`    ${contextId}`);
          console.log(`      lifespan: ${context.lifespanCount}`);
          console.log(`      parameters: ${contextParameters}`);
        });
      }
    }
  });

// Write the initial stream request to config for audio input.
detectStream.write(initialStreamRequest);

// Stream an audio file from disk to the Conversation API, e.g.
// "./resources/audio.raw"
await pump(
  fs.createReadStream(filename),
  // Format the audio stream into the request format.
  new Transform({
    objectMode: true,
    transform: (obj, _, next) => {
      next(null, {inputAudio: obj});
    },
  }),
  detectStream
);

Python

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證機制」。

def detect_intent_stream(project_id, session_id, audio_file_path, language_code):
    """Returns the result of detect intent with streaming audio as input.

    Using the same `session_id` between requests allows continuation
    of the conversation."""
    from google.cloud import dialogflow

    session_client = dialogflow.SessionsClient()

    # Note: hard coding audio_encoding and sample_rate_hertz for simplicity.
    audio_encoding = dialogflow.AudioEncoding.AUDIO_ENCODING_LINEAR_16
    sample_rate_hertz = 16000

    session_path = session_client.session_path(project_id, session_id)
    print("Session path: {}\n".format(session_path))

    def request_generator(audio_config, audio_file_path):
        query_input = dialogflow.QueryInput(audio_config=audio_config)

        # The first request contains the configuration.
        yield dialogflow.StreamingDetectIntentRequest(
            session=session_path, query_input=query_input
        )

        # Here we are reading small chunks of audio data from a local
        # audio file.  In practice these chunks should come from
        # an audio input device.
        with open(audio_file_path, "rb") as audio_file:
            while True:
                chunk = audio_file.read(4096)
                if not chunk:
                    break
                # The later requests contains audio data.
                yield dialogflow.StreamingDetectIntentRequest(input_audio=chunk)

    audio_config = dialogflow.InputAudioConfig(
        audio_encoding=audio_encoding,
        language_code=language_code,
        sample_rate_hertz=sample_rate_hertz,
    )

    requests = request_generator(audio_config, audio_file_path)
    responses = session_client.streaming_detect_intent(requests=requests)

    print("=" * 20)
    for response in responses:
        print(
            'Intermediate transcript: "{}".'.format(
                response.recognition_result.transcript
            )
        )
        # Note: Since Python gRPC doesn't have closeSend method, to stop processing the audio after result is recognized,
        # you may close the channel manually to prevent further iteration.
        # Keep in mind that if there is a silence chunk in the audio, part after it might be missed because of early teardown.
        # https://cloud.google.com/dialogflow/es/docs/how/detect-intent-stream#streaming_basics
        if response.recognition_result.is_final:
            session_client.transport.close()
            break

    # Note: The result from the last response is the final transcript along
    # with the detected content.
    query_result = response.query_result

    print("=" * 20)
    print("Query text: {}".format(query_result.query_text))
    print(
        "Detected intent: {} (confidence: {})\n".format(
            query_result.intent.display_name, query_result.intent_detection_confidence
        )
    )
    print("Fulfillment text: {}\n".format(query_result.fulfillment_text))

其他語言

C#:請按照用戶端程式庫頁面上的C# 設定說明操作,然後參閱 .NET 適用的 Dialogflow 參考說明文件

PHP:請按照用戶端程式庫頁面上的 PHP 設定操作說明操作,然後參閱 PHP 適用的 Dialogflow 參考文件

Ruby:請按照用戶端程式庫頁面上的 Ruby 設定說明操作,然後參閱 Ruby 適用的 Dialogflow 參考文件

範例

如要瞭解從瀏覽器麥克風串流至 Dialogflow 的最佳做法,請參閱範例頁面