Stay organized with collections
Save and categorize content based on your preferences.
The Process-with-Document AI pipeline allows users to process
existing documents with a Document AI processor and update the
document properties with the newly extracted entities.
Prerequisites
Before you begin, you need the following:
A Document AI processor ready under the same Google Cloud
project.
If you don't have a processor, follow the steps to create
one. You can choose to create any
type as long as the processor type matches the document type.
Dedicated Cloud Storage folders for storing exported documents and
processed documents.
Make sure the folders are empty before you start the pipeline.
A schema with mappings between Document AI entities and
Document AI Warehouse properties.
The newly extracted entities might not be correctly converted to
Document AI Warehouse entities without such a mapping.
The documents list is the resource names of the documents to be processed. The Cloud Storage folder path export_folder_path is used to store the exported documents before being sent to the processor. For more information about the request body fields, refer to the API documentation.
This command returns a resource name for a long-running operation. With this
resource name, you can track the progress of the pipeline by following the next
step.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eDocument AI Warehouse is being deprecated and will be unavailable after January 16, 2025, requiring users to migrate their data to an alternative storage solution like Cloud Storage before this date.\u003c/p\u003e\n"],["\u003cp\u003eThe Process-with-Document AI pipeline allows users to process existing documents with a Document AI processor and update document properties with the extracted entities.\u003c/p\u003e\n"],["\u003cp\u003eTo use the pipeline, you must have a Document AI processor, dedicated empty Cloud Storage folders for exported and processed documents, and a schema with mappings between Document AI entities and Document AI Warehouse properties.\u003c/p\u003e\n"],["\u003cp\u003eThe Process-with-Document AI pipeline can process documents up to a maximum size of 200 pages.\u003c/p\u003e\n"],["\u003cp\u003eUsers can initiate the pipeline via a REST API call and then retrieve the results to check if documents are successfully updated by going to the Document AI Warehouse UI or calling the document:get API.\u003c/p\u003e\n"]]],[],null,["# Process-with-Document-AI pipeline\n\n| **Caution** : Document AI Warehouse is deprecated and will no longer be available on Google Cloud after January 16, 2025. To safeguard your data, migrate any documents currently saved in Document AI Warehouse to an alternative like Cloud Storage. Verify that your data migration is completed before the discontinuation date to prevent any data loss. See [Deprecations](/document-warehouse/docs/deprecations) for details.\n\n\u003cbr /\u003e\n\nThe Process-with-Document AI pipeline allows users to process\nexisting documents with a Document AI processor and update the\ndocument properties with the newly extracted entities.\n\nPrerequisites\n-------------\n\nBefore you begin, you need the following:\n\n1. A Document AI processor ready under the same Google Cloud\n project.\n\n - If you don't have a processor, follow the steps to [create\n one](/document-ai/docs/create-processor). You can choose to create any type as long as the processor type matches the document type.\n2. Dedicated Cloud Storage folders for storing exported documents and\n processed documents.\n\n - Make sure the folders are empty before you start the pipeline.\n3. A schema with mappings between Document AI entities and\n Document AI Warehouse properties.\n\n - The newly extracted entities might not be correctly converted to\n Document AI Warehouse entities without such a mapping.\n\n - To add mappings to the schema, follow [set schemas with\n mapping](/document-warehouse/docs/set-schema-with-mapping).\n\n| **Note:** The Process-with-Document AI pipeline can only process documents up to 200 pages.\n\nRun the pipeline\n----------------\n\n### REST\n\n curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e:runPipeline' \\\n --header 'Content-Type: application/json' \\\n --header \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n --data '{\n \"name\": \"projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e\",\n \"process_with_doc_ai_pipeline\": {\n \"documents\": [\n \"projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/documents/\u003cvar translate=\"no\"\u003eDOCUMENT\u003c/var\u003e\"\n ],\n \"export_folder_path\": \"gs://\u003cvar translate=\"no\"\u003eEXPORT_FOLDER\u003c/var\u003e\",\n \"processor_info\": {\n \"processor_name\": \"projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/processors/\u003cvar translate=\"no\"\u003ePROCESSOR\u003c/var\u003e\"\n },\n \"processor_results_folder_path\": \"gs://\u003cvar translate=\"no\"\u003ePROCESS_FOLDER\u003c/var\u003e\"\n },\n \"request_metadata\": {\n \"user_info\": {\n \"id\": \"user:\u003cvar translate=\"no\"\u003eUSER EMAIL ADDRESS\u003c/var\u003e\"\n }\n }\n }'\n\nThe `documents` list is the resource names of the documents to be processed. The Cloud Storage folder path `export_folder_path` is used to store the exported documents before being sent to the processor. For more information about the request body fields, refer to the [API documentation](/document-warehouse/docs/reference/rest/v1/projects.locations/runPipeline#processwithdocaipipeline).\n\nThis command returns a resource name for a long-running operation. With this\nresource name, you can track the progress of the pipeline by following the next\nstep.\n\nGet long-running operation result\n---------------------------------\n\n### REST\n\n curl --location --request GET 'https://contentwarehouse.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/operations/\u003cvar translate=\"no\"\u003eOPERATION\u003c/var\u003e' \\\n --header \"Authorization: Bearer ${AUTH_TOKEN}\"\n\nNext steps\n----------\n\nGo to [Document AI Warehouse UI](https://documentwarehouse.cloud.google.com/) or call\nthe\n[document:get](/document-warehouse/docs/reference/rest/v1/projects.locations.documents/get)\nAPI to check if documents are successfully updated."]]