本頁面由 Cloud Translation API 翻譯而成。

密集文件文字偵測教學課程

目標對象

本教學課程的目標是協助您使用 Google Cloud Vision API 文件文字偵測功能開發應用程式。本教學課程假設您熟悉基本程式設計結構和技巧，但即使您是初學程式設計者，也應該能夠輕鬆跟著操作本教學課程，然後使用 Cloud Vision API 參考說明文件建立基本應用程式。

必要條件

在 Google Cloud 主控台設定 Cloud Vision API 專案。
設定使用應用程式預設憑證的環境。

Python

使用文件文字 OCR 為圖片加上註解

本教學課程會逐步引導您操作基本的 Vision API 應用程式，該應用程式會發出 DOCUMENT_TEXT_DETECTION 要求，然後處理 fullTextAnnotation 回應。

請注意，標準 TEXT_DETECTION 和 DOCUMENT_TEXT_DETECTION 都會傳回 fullTextAnnotation，如下所述。不過，付費 DOCUMENT_TEXT_DETECTION 功能沒有輸入字元限制。此外，如果 Cloud Vision 要求中同時指定 TEXT_DETECTION 和 DOCUMENT_TEXT_DETECTION，則 DOCUMENT_TEXT_DETECTION 的優先順序會優先。

fullTextAnnotation 是從圖片中擷取的 UTF-8 文字的結構化分層回應，以頁面→區塊→段落→字詞→符號的順序排列：

Page 是一系列的區塊，以及頁面的中繼資訊：大小、解析度 (X 解析度和 Y 解析度可能不同)。
Block 代表頁面中的一個「邏輯」元素，例如文字覆蓋的區域，或是欄之間的圖片或分隔符。文字和表格區塊包含擷取文字所需的主要資訊。
Paragraph 是文字的結構單元，代表有序的字詞序列。根據預設，系統會視為字詞以斷字符號分隔。
Word 是文字的最小單位。以符號陣列表示。
Symbol 代表字元或標點符號。

fullTextAnnotation 也可以提供網頁圖片的網址，這些網頁圖片與要求中的圖片有部分或完全相符。

系統仍會支援先前的 textAnnotations OCR 輸出內容，並在 JSON 回應中以 textAnnotations 的形式提供。

完整程式碼清單

閱讀程式碼時，建議您參考 Cloud Vision API Python 參考資料。

import argparse
from enum import Enum

from google.cloud import vision
from PIL import Image, ImageDraw



class FeatureType(Enum):
    PAGE = 1
    BLOCK = 2
    PARA = 3
    WORD = 4
    SYMBOL = 5


def draw_boxes(image, bounds, color):
    """Draws a border around the image using the hints in the vector list.

    Args:
        image: the input image object.
        bounds: list of coordinates for the boxes.
        color: the color of the box.

    Returns:
        An image with colored bounds added.
    """
    draw = ImageDraw.Draw(image)

    for bound in bounds:
        draw.polygon(
            [
                bound.vertices[0].x,
                bound.vertices[0].y,
                bound.vertices[1].x,
                bound.vertices[1].y,
                bound.vertices[2].x,
                bound.vertices[2].y,
                bound.vertices[3].x,
                bound.vertices[3].y,
            ],
            None,
            color,
        )
    return image


def get_document_bounds(image_file, feature):
    """Finds the document bounds given an image and feature type.

    Args:
        image_file: path to the image file.
        feature: feature type to detect.

    Returns:
        List of coordinates for the corresponding feature type.
    """
    client = vision.ImageAnnotatorClient()

    bounds = []

    with open(image_file, "rb") as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if feature == FeatureType.SYMBOL:
                            bounds.append(symbol.bounding_box)

                    if feature == FeatureType.WORD:
                        bounds.append(word.bounding_box)

                if feature == FeatureType.PARA:
                    bounds.append(paragraph.bounding_box)

            if feature == FeatureType.BLOCK:
                bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds




def render_doc_text(filein, fileout):
    """Outlines document features (blocks, paragraphs and words) given an image.

    Args:
        filein: path to the input image.
        fileout: path to the output image.
    """
    image = Image.open(filein)
    bounds = get_document_bounds(filein, FeatureType.BLOCK)
    draw_boxes(image, bounds, "blue")
    bounds = get_document_bounds(filein, FeatureType.PARA)
    draw_boxes(image, bounds, "red")
    bounds = get_document_bounds(filein, FeatureType.WORD)
    draw_boxes(image, bounds, "yellow")

    if fileout != 0:
        image.save(fileout)
    else:
        image.show()


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("detect_file", help="The image for text detection.")
    parser.add_argument("-out_file", help="Optional output file", default=0)
    args = parser.parse_args()

    render_doc_text(args.detect_file, args.out_file)

此簡易應用程式會執行下列工作：

匯入執行應用程式時需要的程式庫
使用三個引數，並將其傳遞至 main() 函式：
- image_file：要標註的輸入圖片檔案
- output_file：Cloud Vision 會在這個輸出檔案名稱中產生輸出圖片，並繪製多邊形框
建立 ImageAnnotatorClient 例項，以便與服務互動
傳送要求並傳回回應
建立輸出圖片，並在文字周圍繪製方塊

深入查看程式碼

匯入程式庫

import argparse
from enum import Enum

from google.cloud import vision
from PIL import Image, ImageDraw

我們匯入標準程式庫：

argparse，以允許應用程式接受輸入檔案名稱做為引數
enum，適用於 FeatureType 列舉
檔案 I/O 的 io

其他匯入項目：

google.cloud.vision 程式庫中的 ImageAnnotatorClient 類別，用於存取 Vision API。
google.cloud.vision 程式庫中的 types 模組，用於建構要求。
PIL 程式庫中的 Image 和 ImageDraw 程式庫用於在輸入圖片上繪製方塊，以建立輸出圖片。

執行應用程式

parser = argparse.ArgumentParser()
parser.add_argument("detect_file", help="The image for text detection.")
parser.add_argument("-out_file", help="Optional output file", default=0)
args = parser.parse_args()

render_doc_text(args.detect_file, args.out_file)

我們在此只針對傳入的引數進行剖析，並將其傳遞至 render_doc_text() 函式。

向 API 進行驗證

在與 Vision API 服務通訊前，您必須使用先前取得的憑證驗證服務。要在應用程式中取得憑證，最簡單的方式是使用應用程式預設憑證 (ADC)。根據預設，Cloud 用戶端程式庫會嘗試從 GOOGLE_APPLICATION_CREDENTIALS 環境變數取得憑證，該變數應設為指向服務帳戶的 JSON 金鑰檔案 (詳情請參閱「設定服務帳戶」)。

提出 API 要求，並從回應中讀取文字邊界

在 Vision API 服務準備就緒後，我們可以透過呼叫 ImageAnnotatorClient 例項的 document_text_detection 方法存取這項服務。

用戶端程式庫會封裝 API 要求與回應的詳細資料。如要完整瞭解要求的結構，請參閱 Vision API 參考資料。

def get_document_bounds(image_file, feature):
    """Finds the document bounds given an image and feature type.

    Args:
        image_file: path to the image file.
        feature: feature type to detect.

    Returns:
        List of coordinates for the corresponding feature type.
    """
    client = vision.ImageAnnotatorClient()

    bounds = []

    with open(image_file, "rb") as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if feature == FeatureType.SYMBOL:
                            bounds.append(symbol.bounding_box)

                    if feature == FeatureType.WORD:
                        bounds.append(word.bounding_box)

                if feature == FeatureType.PARA:
                    bounds.append(paragraph.bounding_box)

            if feature == FeatureType.BLOCK:
                bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds

在用戶端程式庫處理要求後，我們的回應會包含 AnnotateImageResponse，其中包含圖片註解結果清單，每個要求中傳送的圖片各有一項。由於我們在要求中只傳送了一個圖片，因此我們會逐步查看完整的 TextAnnotation，並收集指定文件地圖項目的邊界。

執行應用程式

如要執行應用程式，您可以下載這個 receipt.jpg 檔案 (您可能需要在連結上按一下滑鼠右鍵)，然後將您在本機下載檔案的位置傳遞至教學課程應用程式 (doctext.py)。

以下是 Python 指令，以及文字註解輸出圖片。

$ python doctext.py receipt.jpg -out_file out.jpg

下圖顯示的字詞以黃色方塊標示，句子則以紅色標示。

恭喜！您已使用 Google Cloud Vision 全文註解功能執行文字偵測！

密集文件文字偵測教學課程 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

目標對象

必要條件

Python

使用文件文字 OCR 為圖片加上註解

完整程式碼清單

深入查看程式碼

匯入程式庫

執行應用程式

向 API 進行驗證

提出 API 要求，並從回應中讀取文字邊界

執行應用程式

密集文件文字偵測教學課程