Stay organized with collections
Save and categorize content based on your preferences.
This document describes how to update existing documents with newly processed
results from another Document AI processor.
Current limitations
The ProcessWithDocAi
pipeline
can only work with documents that have been processed during the ingestion.
Future versions of the pipeline will expand this pipeline to documents that are
not processed yet. Please stay tuned for updates in our release
notes.
Configure schema mapping in document schema (admin only)
The Set schemas with mapping
document describes how to configure new or existing properties of a
Document AI Warehouse schema to be mapped with extracted entities from
Document AI. We demonstrate how to do this in the
Document AI Warehouse admin console.
Go to the Document AI Warehouse admin console
Ensure that you have admin access. Then you can go to the admin
console.
Create a new schema with schema mapping
You can create a new schema with schema mapping, and the same steps can apply to
existing schemas by updating schemas in the schema details view. Just note that
schema-update functionalities are currently limited. We do not support removing
or modifying existing properties.
Click Add new in the Schema manager tab:
Enter a display name and description.
Note that you cannot modify display name after creation. Description is
optional and can be updated after creation.
Click Next. The JSON editor of the schema appears, including the display
name and description entered in the previous step.
In this example, add a total_amount property to
property_definitions list:
The property is configured to be mapped to the total_amount field
extracted by INVOICE_PROCESSOR. To get the full list of processor types,
you can use the fetchProcessorTypes
API.
To create a schema, click Done. A confirmation message shows up after
completion:
You can view the schema in detail.
Example: Use an invoice processor for the OCR-processed documents
With the property mapping defined in the schema, you can reprocess the
OCR-processed documents with an invoice processor, which has the total_amount
property mapped from parsed results.
Select documents to reprocess.
Select documents that you want to apply another processor to. Then click
Extract with Document AI on the action bar.
The pop-up dialog box has three input fields:
Processor ID: processor ID that you want to use to process the
documents.
Export file path: a staging bucket path to temporary store the
documents.
Processor results folder path: a bucket path to store the processor
parsed documents.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eDocument AI Warehouse is being deprecated and will be unavailable after January 16, 2025, so users must migrate their data to an alternative like Cloud Storage before this date.\u003c/p\u003e\n"],["\u003cp\u003eThis document outlines how to update existing documents with new results from a Document AI processor, although this is currently limited to documents processed during ingestion, with future expansions planned.\u003c/p\u003e\n"],["\u003cp\u003eYou can configure new or existing properties of a Document AI Warehouse schema to map extracted entities from Document AI, demonstrated in the Document AI Warehouse admin console.\u003c/p\u003e\n"],["\u003cp\u003eUsers can reprocess OCR-processed documents using an invoice processor to map properties like \u003ccode\u003etotal_amount\u003c/code\u003e, by selecting documents and using the "Extract with Document AI" feature, requiring processor ID and bucket paths.\u003c/p\u003e\n"],["\u003cp\u003eAfter triggering the pipeline for reprocessing, a status page will appear, and users can examine the updated results to verify successful mapping, such as seeing the \u003ccode\u003etotal_amount\u003c/code\u003e property mapped from parsed results.\u003c/p\u003e\n"]]],[],null,["# Process documents with a Document AI processor\n\n| **Caution** : Document AI Warehouse is deprecated and will no longer be available on Google Cloud after January 16, 2025. To safeguard your data, migrate any documents currently saved in Document AI Warehouse to an alternative like Cloud Storage. Verify that your data migration is completed before the discontinuation date to prevent any data loss. See [Deprecations](/document-warehouse/docs/deprecations) for details.\n\n\u003cbr /\u003e\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis document describes how to update existing documents with newly processed\nresults from another Document AI processor.\n\nCurrent limitations\n-------------------\n\nThe [ProcessWithDocAi\npipeline](/document-warehouse/docs/reference/rest/v1/projects.locations/runPipeline#ProcessWithDocAiPipeline)\ncan only work with documents that have been processed during the ingestion.\nFuture versions of the pipeline will expand this pipeline to documents that are\nnot processed yet. Please stay tuned for updates in our [release\nnotes](/document-warehouse/docs/release-notes).\n\nConfigure schema mapping in document schema (admin only)\n--------------------------------------------------------\n\nThe [Set schemas with mapping](/document-warehouse/docs/set-schema-with-mapping)\ndocument describes how to configure new or existing properties of a\nDocument AI Warehouse schema to be mapped with extracted entities from\nDocument AI. We demonstrate how to do this in the\nDocument AI Warehouse admin console.\n\n### Go to the Document AI Warehouse admin console\n\nEnsure that you have admin access. Then you can go to the [admin\nconsole](https://documentwarehouse.cloud.google.com/admin).\n\n### Create a new schema with schema mapping\n\nYou can create a new schema with schema mapping, and the same steps can apply to\nexisting schemas by updating schemas in the schema details view. Just note that\nschema-update functionalities are currently limited. We do not support removing\nor modifying existing properties.\n\n1. Click **Add new** in the Schema manager tab:\n\n2. Enter a display name and description.\n\n Note that you cannot modify display name after creation. Description is\n optional and can be updated after creation.\n3. Click **Next**. The JSON editor of the schema appears, including the display\n name and description entered in the previous step.\n\n In this example, add a **total_amount** property to\n **property_definitions** list: \n\n {\n \"name\": \"total_amount\",\n \"display_name\": \"Total Amount\",\n \"is_repeatable\": false,\n \"is_filterable\": true,\n \"is_searchable\": true,\n \"is_metadata\": false,\n \"is_required\": false,\n \"float_type_options\": {},\n \"schema_sources\": [\n {\n \"name\": \"total_amount\",\n \"processor_type\": \"INVOICE_PROCESSOR\"\n }\n ]\n }\n\n The property is configured to be mapped to the `total_amount` field\n extracted by `INVOICE_PROCESSOR`. To get the full list of processor types,\n you can use the [fetchProcessorTypes\n API](/document-ai/docs/reference/rest/v1/projects.locations/fetchProcessorTypes#google.cloud.documentai.v1.DocumentProcessorService.FetchProcessorTypes).\n | **Note:** More details about property definitions can be found in [this\n | documentation](/document-warehouse/docs/reference/rest/v1/projects.locations.documentSchemas#propertydefinition).\n4. To create a schema, click **Done**. A confirmation message shows up after\n completion:\n\n5. You can view the schema in detail.\n\nExample: Use an invoice processor for the OCR-processed documents\n-----------------------------------------------------------------\n\nWith the property mapping defined in the schema, you can reprocess the\nOCR-processed documents with an invoice processor, which has the `total_amount`\nproperty mapped from parsed results.\n\n1. Select documents to reprocess.\n\n Select documents that you want to apply another processor to. Then click\n **Extract with Document AI** on the action bar.\n\n The pop-up dialog box has three input fields:\n 1. **Processor ID**: processor ID that you want to use to process the\n documents.\n\n 2. **Export file path**: a staging bucket path to temporary store the\n documents.\n\n 3. **Processor results folder path**: a bucket path to store the processor\n parsed documents.\n\n | **Note:** Currently, the maximum number of selected documents is **3,000** . If the number of documents in the search results exceeds 3,000, **Select\n | All Search Results** is disabled. Refine your search or filter criteria to reduce the amount of documents in the search results.\n2. Get the invoice processor ID.\n\n Similar to creating the OCR parser in the [bulk upload\n guide](/document-warehouse/docs/pipeline-ui-bulk-upload#create_an_ocr_processor_and_get_processor_id),\n you can create an invoice processor from the [processor gallery](https://console.cloud.google.com/ai/document-ai/processor-library).\n\n Find the ID on the Processor Details page:\n\n3. Trigger the pipeline and track the status.\n\n After putting the ID and two bucket paths in the dialog, you can click\n **Extract** to trigger the pipeline:\n\n4. After triggering the pipeline, a status tracking page appears:\n\n Currently, the page does not have in-progress tracking. The status page\n shows pending until the job is complete.\n5. Examine the updated results.\n\n Once the job is completed, you can click the document to examine whether the\n extraction results have been updated correctly.\n\n You can see the `total_amount` property has been mapped from parsed results.\n\n In the AI view, you can find the complete list of extracted entities:\n\n | **Note:** The entity mapping also works with Bulk Upload with Document AI.\n\nNext step\n---------\n\nFollow [the next guide](/document-warehouse/docs/pipeline-ui-export-to-cdw) to\nexport the documents to a [Custom Document\nExtractor](/document-ai/docs/workbench/build-custom-processor) in\nDocument AI workbench."]]