本頁面由 Cloud Translation API 翻譯而成。

辨識文字

「文字偵測」功能會執行光學字元辨識 (OCR)，偵測及擷取輸入影片中的文字。

文字偵測功能適用於 Cloud Vision API 支援的所有語言。

要求對 Cloud Storage 中的影片執行文字偵測

下列範例示範如何對位於 Cloud Storage 的檔案執行文字偵測。

REST

傳送影片註解要求

以下說明如何對 videos:annotate 方法傳送 POST 要求。這個範例使用 Google Cloud CLI 建立存取權杖。如需安裝 gcloud CLI 的操作說明，請參閱 Video Intelligence API 快速入門。

使用任何要求資料之前，請先替換以下項目：

INPUT_URI：包含要註解檔案的 Cloud Storage bucket，包括檔案名稱。開頭必須為 gs://。
例如： "inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
LANGUAGE_CODE：[選用] 例如「en-US」
PROJECT_NUMBER：專案的數值 ID Google Cloud

HTTP 方法和網址：

POST https://videointelligence.googleapis.com/v1/videos:annotate

JSON 要求主體：

{
  "inputUri": "INPUT_URI",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

如果回應成功，Video Intelligence API 會傳回作業的 name。上例顯示這類回應的範例，其中：project-number 是專案編號，operation-id 是為要求建立的長時間執行作業 ID。

PROJECT_NUMBER：專案編號
LOCATION_ID：應進行註解的雲端地區。支援的雲端區域包括：us-east1、us-west1、europe-west1、asia-east1。如果沒有指定任何地區，則會依據影片檔案位置來決定地區。
OPERATION_ID：為要求建立的長時間執行作業 ID，並在您開始作業時提供於回應中，例如 12345...

取得註解結果

如要擷取作業結果，請使用對 videos:annotate 的呼叫傳回的作業名稱，發出 GET 要求，如下列範例所示。

使用任何要求資料之前，請先替換以下項目：

OPERATION_NAME：Video Intelligence API 傳回的作業名稱。作業名稱的格式為 projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID
PROJECT_NUMBER：專案的數值 ID Google Cloud

HTTP 方法和網址：

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

回應

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
  }

文字偵測註解會以 textAnnotations 清單傳回。注意：只有在 done 欄位的值為 True 時，系統才會傳回這個欄位。如果作業未完成，則回應不會含有這個欄位。

下載註解結果

將註解從來源複製到目標值區：(請參閱「複製檔案和物件」)

gcloud storage cp gcs_uri gs://my-bucket

注意：如果輸出 GCS URI 是由使用者提供，註解就會儲存在該 GCS URI 中。

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetectionGCS analyzes a video and extracts the text from the video's audio.
func textDetectionGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://python-docs-samples-tests/video/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

如要向 Video Intelligence 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

/**
 * Detect Text in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults detectTextGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

如要向 Video Intelligence 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    console.log(
      ` Start: ${time.startTimeOffset.seconds || 0}.${(
        time.startTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(
      ` End: ${time.endTimeOffset.seconds || 0}.${(
        time.endTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(` Confidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

如要向 Video Intelligence 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

"""Detect text in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]

operation = video_client.annotate_video(
    request={"features": features, "input_uri": input_uri}
)

print("\nProcessing video for text detection.")
result = operation.result(timeout=600)

# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]

for text_annotation in annotation_result.text_annotations:
    print("\nText: {}".format(text_annotation.text))

    # Get the first text segment
    text_segment = text_annotation.segments[0]
    start_time = text_segment.segment.start_time_offset
    end_time = text_segment.segment.end_time_offset
    print(
        "start_time: {}, end_time: {}".format(
            start_time.seconds + start_time.microseconds * 1e-6,
            end_time.seconds + end_time.microseconds * 1e-6,
        )
    )

    print("Confidence: {}".format(text_segment.confidence))

    # Show the result for the first frame in this segment.
    frame = text_segment.frames[0]
    time_offset = frame.time_offset
    print(
        "Time offset for the first frame: {}".format(
            time_offset.seconds + time_offset.microseconds * 1e-6
        )
    )
    print("Rotated Bounding Box Vertices:")
    for vertex in frame.rotated_bounding_box.vertices:
        print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

其他語言

C#：請按照用戶端程式庫頁面上的 C# 設定說明操作，然後參閱 .NET 適用的 Video Intelligence 參考說明文件。

PHP：請按照用戶端程式庫頁面的 PHP 設定說明操作，然後前往 PHP 適用的 Video Intelligence 參考文件。

Ruby：請按照用戶端程式庫頁面的 Ruby 設定說明操作，然後前往 Ruby 適用的 Video Intelligence 參考說明文件。

要求對本機檔案中的影片進行文字偵測

下列範例示範如何對本機儲存的檔案進行文字偵測。

REST

傳送影片註解要求

如要對本機影片檔案執行註解，請務必對影片檔案的內容執行 base64 編碼。在要求的 inputContent 欄位中加入 Base64 編碼的內容。如要瞭解如何對影片檔案內容進行 base64 編碼，請參閱「Base64 編碼」一文。

以下說明如何對 videos:annotate 方法傳送 POST 要求。這個範例使用 Google Cloud CLI 建立存取權杖。如需安裝 Google Cloud CLI 的操作說明，請參閱 Video Intelligence API 快速入門指南

使用任何要求資料之前，請先替換以下項目：

「inputContent」：BASE64_ENCODED_CONTENT
例如：
"UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
LANGUAGE_CODE：[選用] 例如「en-US」
PROJECT_NUMBER：專案的數值 ID Google Cloud

HTTP 方法和網址：

POST https://videointelligence.googleapis.com/v1/videos:annotate

JSON 要求主體：

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

如果回應成功，Video Intelligence API 會傳回作業的 name。上文顯示這類回應的範例，其中 project-number 是專案名稱，而 operation-id 是為要求建立的長時間執行作業 ID。

OPERATION_ID：啟動作業時，回應中提供的 ID，例如 12345...

取得註解結果

如要擷取作業結果，請使用對 videos:annotate 的呼叫傳回的作業名稱，發出 GET 要求，如下列範例所示。

使用任何要求資料之前，請先替換以下項目：

PROJECT_NUMBER：專案的數值 ID Google Cloud

HTTP 方法和網址：

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

回應

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
}

Go


import (
	"context"
	"fmt"
	"io"
	"os"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetection analyzes a video and extracts the text from the video's audio.
func textDetection(w io.Writer, filename string) error {
	// filename := "../testdata/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := os.ReadFile(filename)
	if err != nil {
		return fmt.Errorf("os.ReadFile: %w", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

/**
 * Detect text in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults detectText(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

如要向 Video Intelligence 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');

// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds || 0}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds || 0}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

import io

from google.cloud import videointelligence

def video_detect_text(path):
    """Detect text in a local video."""
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [videointelligence.Feature.TEXT_DETECTION]
    video_context = videointelligence.VideoContext()

    with io.open(path, "rb") as file:
        input_content = file.read()

    operation = video_client.annotate_video(
        request={
            "features": features,
            "input_content": input_content,
            "video_context": video_context,
        }
    )

    print("\nProcessing video for text detection.")
    result = operation.result(timeout=300)

    # The first result is retrieved because a single video was processed.
    annotation_result = result.annotation_results[0]

    for text_annotation in annotation_result.text_annotations:
        print("\nText: {}".format(text_annotation.text))

        # Get the first text segment
        text_segment = text_annotation.segments[0]
        start_time = text_segment.segment.start_time_offset
        end_time = text_segment.segment.end_time_offset
        print(
            "start_time: {}, end_time: {}".format(
                start_time.seconds + start_time.microseconds * 1e-6,
                end_time.seconds + end_time.microseconds * 1e-6,
            )
        )

        print("Confidence: {}".format(text_segment.confidence))

        # Show the result for the first frame in this segment.
        frame = text_segment.frames[0]
        time_offset = frame.time_offset
        print(
            "Time offset for the first frame: {}".format(
                time_offset.seconds + time_offset.microseconds * 1e-6
            )
        )
        print("Rotated Bounding Box Vertices:")
        for vertex in frame.rotated_bounding_box.vertices:
            print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

其他語言

C#：請按照用戶端程式庫頁面上的 C# 設定說明操作，然後參閱 .NET 適用的 Video Intelligence 參考說明文件。

PHP：請按照用戶端程式庫頁面的 PHP 設定說明操作，然後前往 PHP 適用的 Video Intelligence 參考文件。

Ruby：請按照用戶端程式庫頁面的 Ruby 設定說明操作，然後前往 Ruby 適用的 Video Intelligence 參考說明文件。