The Process-with-Document AI pipeline allows users to process existing documents with a Document AI processor and update the document properties with the newly extracted entities.
Prerequisites
Before you begin, you need the following:
A Document AI processor ready under the same Google Cloud project.
- If you don't have a processor, follow the steps to create one. You can choose to create any type as long as the processor type matches the document type.
Dedicated Cloud Storage folders for storing exported documents and processed documents.
- Make sure the folders are empty before you start the pipeline.
A schema with mappings between Document AI entities and Document AI Warehouse properties.
The newly extracted entities might not be correctly converted to Document AI Warehouse entities without such a mapping.
To add mappings to the schema, follow set schemas with mapping.
Run the pipeline
REST
curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION:runPipeline' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${AUTH_TOKEN}" \
--data '{
"name": "projects/PROJECT_NUMBER/locations/LOCATION",
"process_with_doc_ai_pipeline": {
"documents": [
"projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT"
],
"export_folder_path": "gs://EXPORT_FOLDER",
"processor_info": {
"processor_name": "projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR"
},
"processor_results_folder_path": "gs://PROCESS_FOLDER"
},
"request_metadata": {
"user_info": {
"id": "user:USER EMAIL ADDRESS"
}
}
}'
The documents
list is the resource names of the documents to be processed. The Cloud Storage folder path export_folder_path
is used to store the exported documents before being sent to the processor. For more information about the request body fields, refer to the API documentation.
This command returns a resource name for a long-running operation. With this resource name, you can track the progress of the pipeline by following the next step.
Get long-running operation result
REST
curl --location --request GET 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION' \
--header "Authorization: Bearer ${AUTH_TOKEN}"
Next steps
Go to Document AI Warehouse UI or call the document:get API to check if documents are successfully updated.