파일에서 텍스트 감지(PDF/TIFF)

Vision API는 Cloud Storage에 저장된 PDF 및 TIFF 파일에서 텍스트를 감지하고 변환할 수 있습니다.

PDF 및 TIFF 문서의 텍스트 감지를 요청하려면 files:asyncBatchAnnotate 함수를 사용해야 하며, 이 함수는 오프라인(비동기) 요청을 수행하고 operations 리소스를 사용하여 상태를 제공합니다.

PDF/TIFF 요청의 출력은 지정된 Cloud Storage 버킷에서 만든 JSON 파일로 작성됩니다.

제한사항

Vision API는 최대 2,000페이지의 PDF/TIFF 파일을 허용합니다. 파일이 이보다 크면 오류가 반환됩니다.

인증

files:asyncBatchAnnotate 요청에는 API 키가 지원되지 않습니다. 서비스 계정으로 인증하는 방법은 서비스 계정 사용을 참조하세요.

인증에 사용되는 계정에는 사용자가 출력용으로 지정한 Cloud Storage 버킷에 대한 액세스 권한(roles/editor 또는 roles/storage.objectCreator 이상)이 있어야 합니다.

API 키를 사용하여 작업 상태를 쿼리할 수 있습니다. 자세한 내용은 API 키 사용을 참조하세요.

문서 텍스트 감지 요청

현재 PDF/TIFF 문서 감지는 Cloud Storage 버킷에 저장된 파일에만 사용할 수 있습니다. 응답 JSON 파일도 Cloud Storage 버킷에 저장됩니다.

2010년 미국 인구조사 PDF 페이지 — `gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf`, 출처: 미국 인구조사국

참고: 이 기능은 normalizedVertices [0,1]을 포함하고, 실제 픽셀 값(vertices)을 포함하지 않는 결과를 반환합니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

CLOUD_STORAGE_BUCKET: 출력 파일을 저장할 Cloud Storage 버킷/디렉터리이며 다음 형식으로 표시됩니다.
- gs://bucket/directory/
요청하는 사용자에게 버킷에 대한 쓰기 권한이 있어야 합니다.
CLOUD_STORAGE_FILE_URI: Cloud Storage 버킷에 있는 유효한 파일(PDF/TIFF)의 경로입니다. 적어도 파일에 대한 읽기 권한이 있어야 합니다. 예를 들면 다음과 같습니다.
- ```
gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf
```
FEATURE_TYPE: 유효한 기능 유형입니다. files:asyncBatchAnnotate 요청에는 다음 기능 유형을 사용할 수 있습니다.
- DOCUMENT_TEXT_DETECTION
- TEXT_DETECTION
PROJECT_ID: Google Cloud 프로젝트 ID입니다.

필드별 고려사항:

inputConfig - 다른 Vision API 요청에 사용되는 image 필드를 대체하며, 다음 하위 필드 두 개를 포함합니다.
- gcsSource.uri - PDF 또는 TIFF 파일의 Google Cloud Storage URI(요청을 보내는 사용자 또는 서비스 계정에서 액세스 가능)입니다.
- mimeType - 허용되는 파일 형식(application/pdf 또는 image/tiff) 중 하나입니다.
outputConfig - 출력 세부정보를 지정하며, 다음 하위 필드 두 개를 포함합니다.
- gcsDestination.uri - 유효한 Google Cloud Storage URI입니다. 요청을 실행하는 사용자 또는 서비스 계정에 쓰기 권한이 있는 버킷이어야 합니다. 파일 이름은 output-x-to-y이며, 여기서 x와 y는 해당 출력 파일에 포함되는 PDF/TIFF 페이지 번호를 나타냅니다. 파일이 이미 있으면 내용을 덮어씁니다.
- batchSize - 각 출력 JSON 파일에 포함할 출력 페이지 수를 지정합니다.

HTTP 메서드 및 URL:

POST https://vision.googleapis.com/v1/files:asyncBatchAnnotate

JSON 요청 본문:

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "CLOUD_STORAGE_FILE_URI"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "FEATURE_TYPE"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "CLOUD_STORAGE_BUCKET"
        },
        "batchSize": 1
      }
    }
  ]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vision.googleapis.com/v1/files:asyncBatchAnnotate"

PowerShell

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vision.googleapis.com/v1/files:asyncBatchAnnotate" | Select-Object -Expand Content

응답:

asyncBatchAnnotate 요청에 성공하면 이름 필드 하나를 포함하는 응답이 반환됩니다.

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

이 이름은 연결된 ID(예: 1efec2285bd442df)가 있는 장기 실행 작업을 나타내며, 이는 v1.operations API를 사용하여 쿼리할 수 있습니다.

Vision 주석 응답을 검색하려면 v1.operations 엔드포인트에 GET 요청을 보내면서 URL에 작업 ID를 전달합니다.

GET https://vision.googleapis.com/v1/operations/operation-id

예를 들면 다음과 같습니다.

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/projects/project-id/locations/location-id/operations/1efec2285bd442df

작업이 진행 중인 경우:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

작업이 완료되면 state가 DONE으로 표시되고, 지정한 Google Cloud Storage 파일에 결과가 기록됩니다.

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

출력 파일의 JSON은 이미지 [문서 텍스트 감지 요청](/vision/docs/ocr)의 JSON과 유사하지만, 지정된 PDF 또는 TIFF의 위치 및 파일의 페이지 수를 표시하는 context 필드가 추가된다는 점이 다릅니다.

output-1-to-1.json

전체 파일

    {
  "inputConfig": {
    "gcsSource": {
      "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf"
    },
    "mimeType": "application/pdf"
  },
  "responses": [
    {
      "fullTextAnnotation": {
        "pages": [
          {
            "property": {
              "detectedLanguages": [
                {
                  "languageCode": "en",
                  "confidence": 0.94
                }
              ]
            },
            "width": 612,
            "height": 792,
            "blocks": [
              {
                "boundingBox": {
                  "normalizedVertices": [
                    {
                      "x": 0.12908497,
                      "y": 0.10479798
                    },
                    ...
                    {
                      "x": 0.12908497,
                      "y": 0.1199495
                    }
                  ]
                },
                "paragraphs": [
                  {
                  ...
                    },
                    "words": [
                      {
                        ...
                        },
                        "symbols": [
                          {
                          ...
                            "text": "C",
                            "confidence": 0.99
                          },
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ]
                            },
                            "text": "O",
                            "confidence": 0.99
                          },
             ...
             }
            ]
          }
        ],
        "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nList of Statistical Tables...
        \nHow to Use This Census Report ..\nTable Finding Guide .\nUser
        Notes .......\nStatistical Tables.........\nAppendixes
        \nA Geographic Terms and Concepts .........\nB Definitions of
        Subject Characteristics.\nData Collection and Processing Procedures...
        \nQuestionnaire. ........\nE Maps .................\nF Operational
        Overview and accuracy of the Data.......\nG Residence Rule and
        Residence Situations for the \n2010 Census of the United States...
        \nH Acknowledgments .....\nE\n*Appendix may be found in the separate
        volume, CPH-1-A, Summary Population and\nHousing Characteristics,
        Selected Appendixes, on the Internet at
        <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
      },
      "context": {
        "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf",
        "pageNumber": 1
      }
    }
  ]
}

Go

이 샘플을 사용해 보기 전에 Vision 빠른 시작: 클라이언트 라이브러리 사용의 Go 설정 안내를 따르세요. 자세한 내용은 Vision Go API 참고 문서를 참조하세요.

Vision에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


// detectAsyncDocumentURI performs Optical Character Recognition (OCR) on a
// PDF file stored in GCS.
func detectAsyncDocumentURI(w io.Writer, gcsSourceURI, gcsDestinationURI string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	request := &visionpb.AsyncBatchAnnotateFilesRequest{
		Requests: []*visionpb.AsyncAnnotateFileRequest{
			{
				Features: []*visionpb.Feature{
					{
						Type: visionpb.Feature_DOCUMENT_TEXT_DETECTION,
					},
				},
				InputConfig: &visionpb.InputConfig{
					GcsSource: &visionpb.GcsSource{Uri: gcsSourceURI},
					// Supported MimeTypes are: "application/pdf" and "image/tiff".
					MimeType: "application/pdf",
				},
				OutputConfig: &visionpb.OutputConfig{
					GcsDestination: &visionpb.GcsDestination{Uri: gcsDestinationURI},
					// How many pages should be grouped into each json output file.
					BatchSize: 2,
				},
			},
		},
	}

	operation, err := client.AsyncBatchAnnotateFiles(ctx, request)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "Waiting for the operation to finish.")

	resp, err := operation.Wait(ctx)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "%v", resp)

	return nil
}

Java

이 샘플을 시도하기 전에 Vision API 빠른 시작: 클라이언트 라이브러리 사용의 자바 설정 안내를 따르세요. 자세한 내용은 Vision API Java 참고 문서를 참조하세요.

/**
 * Performs document text OCR with PDF/TIFF as source files on Google Cloud Storage.
 *
 * @param gcsSourcePath The path to the remote file on Google Cloud Storage to detect document
 *     text on.
 * @param gcsDestinationPath The path to the remote file on Google Cloud Storage to store the
 *     results on.
 * @throws Exception on errors while closing the client.
 */
public static void detectDocumentsGcs(String gcsSourcePath, String gcsDestinationPath)
    throws Exception {

  // Initialize client that will be used to send requests. This client only needs to be created
  // once, and can be reused for multiple requests. After completing all of your requests, call
  // the "close" method on the client to safely clean up any remaining background resources.
  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    List<AsyncAnnotateFileRequest> requests = new ArrayList<>();

    // Set the GCS source path for the remote file.
    GcsSource gcsSource = GcsSource.newBuilder().setUri(gcsSourcePath).build();

    // Create the configuration with the specified MIME (Multipurpose Internet Mail Extensions)
    // types
    InputConfig inputConfig =
        InputConfig.newBuilder()
            .setMimeType(
                "application/pdf") // Supported MimeTypes: "application/pdf", "image/tiff"
            .setGcsSource(gcsSource)
            .build();

    // Set the GCS destination path for where to save the results.
    GcsDestination gcsDestination =
        GcsDestination.newBuilder().setUri(gcsDestinationPath).build();

    // Create the configuration for the System.output with the batch size.
    // The batch size sets how many pages should be grouped into each json System.output file.
    OutputConfig outputConfig =
        OutputConfig.newBuilder().setBatchSize(2).setGcsDestination(gcsDestination).build();

    // Select the Feature required by the vision API
    Feature feature = Feature.newBuilder().setType(Feature.Type.DOCUMENT_TEXT_DETECTION).build();

    // Build the OCR request
    AsyncAnnotateFileRequest request =
        AsyncAnnotateFileRequest.newBuilder()
            .addFeatures(feature)
            .setInputConfig(inputConfig)
            .setOutputConfig(outputConfig)
            .build();

    requests.add(request);

    // Perform the OCR request
    OperationFuture<AsyncBatchAnnotateFilesResponse, OperationMetadata> response =
        client.asyncBatchAnnotateFilesAsync(requests);

    System.out.println("Waiting for the operation to finish.");

    // Wait for the request to finish. (The result is not used, since the API saves the result to
    // the specified location on GCS.)
    List<AsyncAnnotateFileResponse> result =
        response.get(180, TimeUnit.SECONDS).getResponsesList();

    // Once the request has completed and the System.output has been
    // written to GCS, we can list all the System.output files.
    Storage storage = StorageOptions.getDefaultInstance().getService();

    // Get the destination location from the gcsDestinationPath
    Pattern pattern = Pattern.compile("gs://([^/]+)/(.+)");
    Matcher matcher = pattern.matcher(gcsDestinationPath);

    if (matcher.find()) {
      String bucketName = matcher.group(1);
      String prefix = matcher.group(2);

      // Get the list of objects with the given prefix from the GCS bucket
      Bucket bucket = storage.get(bucketName);
      com.google.api.gax.paging.Page<Blob> pageList = bucket.list(BlobListOption.prefix(prefix));

      Blob firstOutputFile = null;

      // List objects with the given prefix.
      System.out.println("Output files:");
      for (Blob blob : pageList.iterateAll()) {
        System.out.println(blob.getName());

        // Process the first System.output file from GCS.
        // Since we specified batch size = 2, the first response contains
        // the first two pages of the input file.
        if (firstOutputFile == null) {
          firstOutputFile = blob;
        }
      }

      // Get the contents of the file and convert the JSON contents to an AnnotateFileResponse
      // object. If the Blob is small read all its content in one request
      // (Note: the file is a .json file)
      // Storage guide: https://cloud.google.com/storage/docs/downloading-objects
      String jsonContents = new String(firstOutputFile.getContent());
      Builder builder = AnnotateFileResponse.newBuilder();
      JsonFormat.parser().merge(jsonContents, builder);

      // Build the AnnotateFileResponse object
      AnnotateFileResponse annotateFileResponse = builder.build();

      // Parse through the object to get the actual response for the first page of the input file.
      AnnotateImageResponse annotateImageResponse = annotateFileResponse.getResponses(0);

      // Here we print the full text from the first page.
      // The response contains more information:
      // annotation/pages/blocks/paragraphs/words/symbols
      // including confidence score and bounding boxes
      System.out.format("%nText: %s%n", annotateImageResponse.getFullTextAnnotation().getText());
    } else {
      System.out.println("No MATCH");
    }
  }
}

Node.js

이 샘플을 사용해 보기 전에 Vision 빠른 시작: 클라이언트 라이브러리 사용의 Node.js 설정 안내를 따르세요. 자세한 내용은 Vision Node.js API 참고 문서를 참조하세요.

Vision에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1;

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// Bucket where the file resides
// const bucketName = 'my-bucket';
// Path to PDF file within bucket
// const fileName = 'path/to/document.pdf';
// The folder to store the results
// const outputPrefix = 'results'

const gcsSourceUri = `gs://${bucketName}/${fileName}`;
const gcsDestinationUri = `gs://${bucketName}/${outputPrefix}/`;

const inputConfig = {
  // Supported mime_types are: 'application/pdf' and 'image/tiff'
  mimeType: 'application/pdf',
  gcsSource: {
    uri: gcsSourceUri,
  },
};
const outputConfig = {
  gcsDestination: {
    uri: gcsDestinationUri,
  },
};
const features = [{type: 'DOCUMENT_TEXT_DETECTION'}];
const request = {
  requests: [
    {
      inputConfig: inputConfig,
      features: features,
      outputConfig: outputConfig,
    },
  ],
};

const [operation] = await client.asyncBatchAnnotateFiles(request);
const [filesResponse] = await operation.promise();
const destinationUri =
  filesResponse.responses[0].outputConfig.gcsDestination.uri;
console.log('Json saved to: ' + destinationUri);

Python

이 샘플을 사용해 보기 전에 Vision 빠른 시작: 클라이언트 라이브러리 사용의 Python 설정 안내를 따르세요. 자세한 내용은 Vision Python API 참고 문서를 참조하세요.

Vision에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

def async_detect_document(gcs_source_uri, gcs_destination_uri):
    """OCR with PDF/TIFF as source files on GCS"""
    import json
    import re
    from google.cloud import vision
    from google.cloud import storage

    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = "application/pdf"

    # How many pages should be grouped into each json output file.
    batch_size = 2

    client = vision.ImageAnnotatorClient()

    feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

    gcs_source = vision.GcsSource(uri=gcs_source_uri)
    input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.OutputConfig(
        gcs_destination=gcs_destination, batch_size=batch_size
    )

    async_request = vision.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config, output_config=output_config
    )

    operation = client.async_batch_annotate_files(requests=[async_request])

    print("Waiting for the operation to finish.")
    operation.result(timeout=420)

    # Once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    storage_client = storage.Client()

    match = re.match(r"gs://([^/]+)/(.+)", gcs_destination_uri)
    bucket_name = match.group(1)
    prefix = match.group(2)

    bucket = storage_client.get_bucket(bucket_name)

    # List objects with the given prefix, filtering out folders.
    blob_list = [
        blob
        for blob in list(bucket.list_blobs(prefix=prefix))
        if not blob.name.endswith("/")
    ]
    print("Output files:")
    for blob in blob_list:
        print(blob.name)

    # Process the first output file from GCS.
    # Since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    output = blob_list[0]

    json_string = output.download_as_bytes().decode("utf-8")
    response = json.loads(json_string)

    # The actual response for the first page of the input file.
    first_page_response = response["responses"][0]
    annotation = first_page_response["fullTextAnnotation"]

    # Here we print the full text from the first page.
    # The response contains more information:
    # annotation/pages/blocks/paragraphs/words/symbols
    # including confidence scores and bounding boxes
    print("Full text:\n")
    print(annotation["text"])

gcloud

사용하는 gcloud 명령어는 파일 형식에 따라 다릅니다.

PDF 텍스트 감지를 수행하려면 다음 예시와 같이 gcloud ml vision detect-text-pdf 명령어를 사용합니다.
```
gcloud ml vision detect-text-pdf gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
```
TIFF 텍스트 감지를 수행하려면 다음 예시와 같이 gcloud ml vision detect-text-tiff 명령어를 사용합니다.
```
gcloud ml vision detect-text-tiff gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
```

추가 언어

C#: 클라이언트 라이브러리 페이지의 C# 설정 안내를 따른 다음 .NET용 Vision 참고 문서를 참조하세요.

PHP: 클라이언트 라이브러리 페이지의 PHP 설정 안내를 따른 다음 PHP용 Vision 참고 문서를 참조하세요.

Ruby: 클라이언트 라이브러리 페이지의 Ruby 설정 안내를 따른 다음 Ruby용 Vision 참고 문서를 참조하세요.

멀티 리전 지원

이 기능은 현재 OCR 기능에만 적용됩니다(TEXT_DETECTION 또는 DOCUMENT_TEXT_DETECTION 유형).

이제 대륙 수준 데이터 스토리지와 OCR 처리를 지정할 수 있습니다. 현재 지원되는 리전은 다음과 같습니다.

us: 미국 국가만
eu: 유럽 연합

위치

Cloud Vision은 프로젝트의 리소스가 저장되고 처리되는 위치를 제어할 수 있는 기능을 제공합니다. 특히 유럽 연합에서만 데이터를 저장하고 처리하도록 Cloud Vision을 구성할 수 있습니다.

기본적으로 Cloud Vision은 전역 위치에 리소스를 저장하고 처리하므로 리소스가 특정 위치 또는 리전 내에만 유지되도록 보장하지 않습니다. 유럽 연합 위치를 선택하면 Google은 유럽 연합에서만 데이터를 저장하고 처리합니다. 개발자와 사용자는 어디에서든 데이터에 액세스할 수 있습니다.

API를 사용하여 위치 설정

Vision API는 전역 API 엔드포인트(vision.googleapis.com)와 두 가지 리전 기반 엔드포인트인 유럽 연합 엔드포인트(eu-vision.googleapis.com) 및 미국 엔드포인트(us-vision.googleapis.com)를 모두 지원합니다. 리전별 처리에 이러한 엔드포인트를 사용합니다. 예를 들어 유럽 연합에서만 데이터를 저장하고 처리하려면 REST API 호출에 vision.googleapis.com 대신 URI eu-vision.googleapis.com을 사용합니다.

https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/images:annotate
https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/images:asyncBatchAnnotate
https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/files:annotate
https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/files:asyncBatchAnnotate

미국에서만 데이터를 저장하고 처리하려면 위의 메서드로 미국 엔드포인트(us-vision.googleapis.com)를 사용합니다.

클라이언트 라이브러리를 사용하여 위치 설정

Vision API 클라이언트 라이브러리는 기본적으로 전역 API 엔드포인트(vision.googleapis.com)에 액세스합니다. 유럽 연합에서만 데이터를 저장하고 처리하려면 엔드포인트(eu-vision.googleapis.com)를 명시적으로 설정해야 합니다. 다음 코드 샘플은 이 설정을 구성하는 방법을 보여줍니다.

참고: 이 기능은 normalizedVertices [0,1]을 포함하고, 실제 픽셀 값(vertices)을 포함하지 않는 결과를 반환합니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

REGION_ID: 유효한 리전 위치 식별자 중 하나입니다.
- us: 미국 국가만
- eu: 유럽 연합
CLOUD_STORAGE_IMAGE_URI: Cloud Storage 버킷에 있는 유효한 이미지 파일의 경로입니다. 적어도 파일에 대한 읽기 권한이 있어야 합니다. 예를 들면 다음과 같습니다.
- ```
gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf
```
CLOUD_STORAGE_BUCKET: 출력 파일을 저장할 Cloud Storage 버킷/디렉터리이며 다음 형식으로 표시됩니다.
- gs://bucket/directory/
요청하는 사용자에게 버킷에 대한 쓰기 권한이 있어야 합니다.
FEATURE_TYPE: 유효한 기능 유형입니다. files:asyncBatchAnnotate 요청에는 다음 기능 유형을 사용할 수 있습니다.
- DOCUMENT_TEXT_DETECTION
- TEXT_DETECTION
PROJECT_ID: Google Cloud 프로젝트 ID입니다.

필드별 고려사항:

inputConfig - 다른 Vision API 요청에 사용되는 image 필드를 대체하며, 다음 하위 필드 두 개를 포함합니다.
- gcsSource.uri - PDF 또는 TIFF 파일의 Google Cloud Storage URI(요청을 보내는 사용자 또는 서비스 계정에서 액세스 가능)입니다.
- mimeType - 허용되는 파일 형식(application/pdf 또는 image/tiff) 중 하나입니다.
outputConfig - 출력 세부정보를 지정하며, 다음 하위 필드 두 개를 포함합니다.
- gcsDestination.uri - 유효한 Google Cloud Storage URI입니다. 요청을 실행하는 사용자 또는 서비스 계정에 쓰기 권한이 있는 버킷이어야 합니다. 파일 이름은 output-x-to-y이며, 여기서 x와 y는 해당 출력 파일에 포함되는 PDF/TIFF 페이지 번호를 나타냅니다. 파일이 이미 있으면 내용을 덮어씁니다.
- batchSize - 각 출력 JSON 파일에 포함할 출력 페이지 수를 지정합니다.

HTTP 메서드 및 URL:

POST https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate

JSON 요청 본문:

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "CLOUD_STORAGE_IMAGE_URI"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "FEATURE_TYPE"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "CLOUD_STORAGE_BUCKET"
        },
        "batchSize": 1
      }
    }
  ]
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate"

PowerShell

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate" | Select-Object -Expand Content

응답:

asyncBatchAnnotate 요청에 성공하면 이름 필드 하나를 포함하는 응답이 반환됩니다.

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

이 이름은 연결된 ID(예: 1efec2285bd442df)가 있는 장기 실행 작업을 나타내며, 이는 v1.operations API를 사용하여 쿼리할 수 있습니다.

Vision 주석 응답을 검색하려면 v1.operations 엔드포인트에 GET 요청을 보내면서 URL에 작업 ID를 전달합니다.

GET https://vision.googleapis.com/v1/operations/operation-id

예를 들면 다음과 같습니다.

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/projects/project-id/locations/location-id/operations/1efec2285bd442df

작업이 진행 중인 경우:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

작업이 완료되면 state가 DONE으로 표시되고, 지정한 Google Cloud Storage 파일에 결과가 기록됩니다.

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

출력 파일의 JSON은 DOCUMENT_TEXT_DETECTION 기능을 사용하면 이미지의 문서 텍스트 감지 응답과 유사하고 TEXT_DETECTION 기능을 사용하면 텍스트 감지 응답과 유사합니다. 출력에 지정된 PDF 또는 TIFF의 위치 및 파일의 페이지 수가 context 필드에 추가로 표시됩니다.

output-1-to-1.json

전체 파일

    {
  "inputConfig": {
    "gcsSource": {
      "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf"
    },
    "mimeType": "application/pdf"
  },
  "responses": [
    {
      "fullTextAnnotation": {
        "pages": [
          {
            "property": {
              "detectedLanguages": [
                {
                  "languageCode": "en",
                  "confidence": 0.94
                }
              ]
            },
            "width": 612,
            "height": 792,
            "blocks": [
              {
                "boundingBox": {
                  "normalizedVertices": [
                    {
                      "x": 0.12908497,
                      "y": 0.10479798
                    },
                    ...
                    {
                      "x": 0.12908497,
                      "y": 0.1199495
                    }
                  ]
                },
                "paragraphs": [
                  {
                  ...
                    },
                    "words": [
                      {
                        ...
                        },
                        "symbols": [
                          {
                          ...
                            "text": "C",
                            "confidence": 0.99
                          },
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ]
                            },
                            "text": "O",
                            "confidence": 0.99
                          },
             ...
             }
            ]
          }
        ],
        "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nList of Statistical Tables...
        \nHow to Use This Census Report ..\nTable Finding Guide .\nUser
        Notes .......\nStatistical Tables.........\nAppendixes
        \nA Geographic Terms and Concepts .........\nB Definitions of
        Subject Characteristics.\nData Collection and Processing Procedures...
        \nQuestionnaire. ........\nE Maps .................\nF Operational
        Overview and accuracy of the Data.......\nG Residence Rule and
        Residence Situations for the \n2010 Census of the United States...
        \nH Acknowledgments .....\nE\n*Appendix may be found in the separate
        volume, CPH-1-A, Summary Population and\nHousing Characteristics,
        Selected Appendixes, on the Internet at
        <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
      },
      "context": {
        "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf",
        "pageNumber": 1
      }
    }
  ]
}

Go

Vision에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import (
	"context"
	"fmt"

	vision "cloud.google.com/go/vision/apiv1"
	"google.golang.org/api/option"
)

// setEndpoint changes your endpoint.
func setEndpoint(endpoint string) error {
	// endpoint := "eu-vision.googleapis.com:443"

	ctx := context.Background()
	client, err := vision.NewImageAnnotatorClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("NewImageAnnotatorClient: %w", err)
	}
	defer client.Close()

	return nil
}

Java

ImageAnnotatorSettings settings =
    ImageAnnotatorSettings.newBuilder().setEndpoint("eu-vision.googleapis.com:443").build();

// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
ImageAnnotatorClient client = ImageAnnotatorClient.create(settings);

Node.js

Vision에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

async function setEndpoint() {
  // Specifies the location of the api endpoint
  const clientOptions = {apiEndpoint: 'eu-vision.googleapis.com'};

  // Creates a client
  const client = new vision.ImageAnnotatorClient(clientOptions);

  // Performs text detection on the image file
  const [result] = await client.textDetection('./resources/wakeupcat.jpg');
  const labels = result.textAnnotations;
  console.log('Text:');
  labels.forEach(label => console.log(label.description));
}
setEndpoint();

Python

Vision에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

from google.cloud import vision

client_options = {"api_endpoint": "eu-vision.googleapis.com"}

client = vision.ImageAnnotatorClient(client_options=client_options)

직접 사용해 보기

Google Cloud를 처음 사용하는 경우 계정을 만들어 실제 시나리오에서 Cloud Vision API의 성능을 평가할 수 있습니다. 신규 고객에게는 워크로드를 실행, 테스트, 배포하는 데 사용할 수 있는 $300의 무료 크레딧이 제공됩니다.

Cloud Vision API 무료로 사용해 보기

파일에서 텍스트 감지(PDF/TIFF) 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

제한사항

인증

문서 텍스트 감지 요청

REST

curl

PowerShell

Go

Java

Node.js

Python

gcloud

추가 언어

멀티 리전 지원

위치

API를 사용하여 위치 설정

클라이언트 라이브러리를 사용하여 위치 설정

REST

curl

PowerShell

Go

Java

Node.js

Python

직접 사용해 보기

파일에서 텍스트 감지(PDF/TIFF)