Asynchronous image detection and annotation can be performed on a list of generic files, such as PDF files. The files might contain multiple pages and multiple images per page. The files to be processed must be stored in an Object storage bucket, and the detected text is written to an Object storage bucket in JSON format.
You can retrieve the progress and results of an asynchronous batch annotation
request by using the google.longrunning.Operations
interface.
The Operation.metadata
field contains OperationMetadata
. The
Operation.response
field contains the AsyncBatchAnnotateFilesResponse(Results)
.
Before you begin
Follow these steps before using an asynchronous OCR API:
-
You can create the project using a custom resource (CR):
apiVersion: resourcemanager.gdc.goog/v1 kind: Project metadata: labels: atat.config.google.com/clin-number: CLIN_NUMBER atat.config.google.com/task-order-number: TASK_ORDER_NUMBER name: ocr-async-project namespace: platform
Ask your Project IAM Admin to grant you the AI OCR Developer (
ai-ocr-developer
) role in your project namespace.
Prepare your environment
Before using an asynchronous OCR API to detect text offline, you must do the following:
- Create a storage bucket in the project.
- Select the
Standard
class. - Grant
read
andwrite
permissions on the bucket to the service account (g-vai-ocr-sie-sa
) used by the OCR service.
Alternatively, you can follow these steps to create the storage bucket, role, and role binding using custom resources (CR):
Create the storage bucket.
apiVersion: object.gdc.goog/v1 kind: Bucket metadata: name: ocr-async-bucket namespace: ocr-async-project spec: description: bucket for async ocr storageClass: Standard bucketPolicy: lockingPolicy: defaultObjectRetentionDays: 90
Create the role.
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: ocr-async-reader-writer namespace: ocr-async-project rules: - apiGroups: - object.gdc.goog resources: - buckets verbs: - read-object - write-object
Create the role binding.
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: ocr-async-reader-writer-rolebinding namespace: ocr-async-project roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: ocr-async-reader-writer subjects: - kind: ServiceAccount name: g-vai-ocr-sie-sa namespace: g-vai-ocr-sie
Upload files to the object storage bucket
In order for the OCR service to process the file, the files must be uploaded to the object storage bucket.
Follow these steps:
- To configure the gdcloud CLI storage, see Install and configure the storage CLI for projects.
- For the steps to upload objects to a storage bucket, see Upload and download storage objects in projects.
Trigger the AsyncBatchAnnotateFilesRequest
request
AsyncBatchAnnotateFilesRequest
initiates the offline processing and returns
the ID of the long-running process that performs text detection on the file. The
returned ID can be used to track the status of the offline processing. If there
are too many ongoing operations, the offline processing might not start
immediately.
Before sending a request, you must ensure that the OCR service account has read permission to your input bucket and write permission to your output bucket, even though the input and output buckets can be different and in different project namespaces. We recommend using the same input and output buckets to prevent errors in case you provide the wrong name, and the results are written to buckets that don't belong to you.
To call the AsyncBatchAnnotateFilesRequest
, you must specify the following:
- Input file: The file that you want to annotate.
- Output destination: The location where you want to store the annotated results.
- Project ID: The ID of the project that you want to use.
Endpoint: The endpoint that you want to use.
curl
echo '{"parent":PROJECT_ID,"requests": [{"features": [{"type": "DOCUMENT_TEXT_DETECTION"}],"input_config": {"gcs_source": {"uri": INPUT_FILE},"mime_type": "application/pdf"},"output_config": {"gcs_destination": {"uri": OUTPUT_DESTINATION}}}]}' | curl --data-binary @- -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" ENDPOINT/v1/files:asyncBatchAnnotate
Python
The vc.async_batch_annotate_files()
function returns a Google API Core
operation object. This object contains a long-running operation (LRO), which can
be accessed by calling operation.operation
. The operation name can be
obtained from the LRO, and the user can use the name to query the status of the
LRO. The operation.result()
waits until the LRO is complete and then returns
the result.
def vision_func_async(creds):
vc = vision_client(creds)
features = [{"type_": vision.Feature.Type.DOCUMENT_TEXT_DETECTION}]
input_config = {"gcs_source":{"uri":INPUT_FILE},"mime_type": "application/pdf"}
output_config = {"gcs_destination": {"uri": OUTPUT_DESTINATION}}
req = {"input_config": input_config, "output_config": output_config, "features":features}
reqs = {"requests":[req],"parent":PROJECT_ID}
operation = vc.async_batch_annotate_files(request=reqs)
lro = operation.operation
resp = operation.result()
Validate the jobs and check the status
The OPERATION_NAME
returned by the AsyncBatchAnnotateFiles
function is
required to check the status of the operation.
Get operation
The get
method returns the latest state of a long-running operation. Use this
method to poll the operation result generated by the OCR
service. To call the get
method, specify your OPERATION_NAME
and the ENDPOINT
.
curl
curl -X GET "http://ENDPOINT/v1/OPERATION_NAME"
List Operation
The list
method returns a list of the operations that match a specified filter
in the request. The method can return operations from a specific project. To call the
list
method, specify your PROJECT_ID
and the ENDPOINT
.
curl
curl -X GET "http://ENDPOINT/v1/PROJECT_ID?page_size=10"
Delete the bucket
For more information, see Delete objects in storage buckets.