Stay organized with collections
Save and categorize content based on your preferences.
You can transfer documents from the Document AI Warehouse to the Document AI Workbench
using the export-to-Workbench pipeline. The pipeline exports the
documents to a Cloud Storage folder, then imports them to a
Document AI dataset. You provide the Cloud Storage folder and
the Document AI dataset.
Prerequisites
Before you begin, you need the following:
Under the same Google Cloud project, follow the steps to create a processor
.
Dedicate an empty Cloud Storage folder for storing exported documents.
On the custom processor page, click Configure Your Dataset and then Continue to initialize the dataset.
The training and test split ratio can be specified in the training_split_ratio field as a floating-point number. For example, for a set of 10 documents, if the ratio is specified as 0.8, 8 documents will be added to the training set and the remaining 2 documents to the test set.
This command returns a resource name for a long-running operation. Use it to
track the progress of the pipeline in the next step.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eDocument AI Warehouse is being deprecated and will be unavailable after January 16, 2025, requiring users to migrate their data to an alternative like Cloud Storage before this date to avoid data loss.\u003c/p\u003e\n"],["\u003cp\u003eUsers can transfer documents from Document AI Warehouse to Document AI Workbench using the export-to-Workbench pipeline, which involves exporting documents to a designated Cloud Storage folder and then importing them to a Document AI dataset.\u003c/p\u003e\n"],["\u003cp\u003eTo utilize the pipeline, users need a created processor in the same Google Cloud project, a dedicated empty Cloud Storage folder for exported documents, and an initialized dataset in the custom processor settings.\u003c/p\u003e\n"],["\u003cp\u003eThe pipeline can be run with a REST request that specifies the documents to export, the export cloud storage folder, and the dataset to import them to, as well as allowing users to define the ratio between the training and test sets.\u003c/p\u003e\n"]]],[],null,["# Export-to-Workbench pipeline\n\n| **Caution** : Document AI Warehouse is deprecated and will no longer be available on Google Cloud after January 16, 2025. To safeguard your data, migrate any documents currently saved in Document AI Warehouse to an alternative like Cloud Storage. Verify that your data migration is completed before the discontinuation date to prevent any data loss. See [Deprecations](/document-warehouse/docs/deprecations) for details.\n\n\u003cbr /\u003e\n\nYou can transfer documents from the Document AI Warehouse to the Document AI Workbench\nusing the export-to-Workbench pipeline. The pipeline exports the\ndocuments to a Cloud Storage folder, then imports them to a\nDocument AI dataset. You provide the Cloud Storage folder and\nthe Document AI dataset.\n\nPrerequisites\n-------------\n\nBefore you begin, you need the following:\n\n- Under the same Google Cloud project, follow the steps to [create a processor](https://cloud.google.com/document-ai/docs/workbench/build-custom-processor#create_a_processor) .\n- Dedicate an empty Cloud Storage folder for storing exported documents.\n\n- On the custom processor page, click **Configure Your Dataset** and then **Continue** to initialize the dataset.\n\nRun the pipeline\n----------------\n\n### REST\n\n curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e:runPipeline' \\\n --header 'Content-Type: application/json' \\\n --header \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n --data '{\n \"name\": \"projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e\",\n \"export_cdw_pipeline\": {\n \"documents\": [\n \"projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/documents/\u003cvar translate=\"no\"\u003eDOCUMENT\u003c/var\u003e\",\n ],\n \"export_folder_path\": \"gs://\u003cvar translate=\"no\"\u003eCLOUD STORAGE FOLDER\u003c/var\u003e\",\n \"doc_ai_dataset\": \"projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/processors/\u003cvar translate=\"no\"\u003ePROCESSOR\u003c/var\u003e/dataset\",\n \"training_split_ratio\": \u003cvar translate=\"no\"\u003eRATIO\u003c/var\u003e,\n },\n \"request_metadata\": {\n \"user_info\": {\n \"id\": \"user:\u003cvar translate=\"no\"\u003eUSER EMAIL ADDRESS\u003c/var\u003e\",\n }\n }\n }'\n\nThe training and test split ratio can be specified in the `training_split_ratio` field as a floating-point number. For example, for a set of 10 documents, if the ratio is specified as `0.8`, 8 documents will be added to the training set and the remaining 2 documents to the test set.\n\nThis command returns a resource name for a long-running operation. Use it to\ntrack the progress of the pipeline in the next step.\n\nGet long-running operation result\n---------------------------------\n\n### REST\n\n curl --location --request GET 'https://contentwarehouse.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/operations/\u003cvar translate=\"no\"\u003eOPERATION\u003c/var\u003e' \\\n --header \"Authorization: Bearer ${AUTH_TOKEN}\"\n\nNext step\n---------\n\n- Go to your [Document AI](https://console.cloud.google.com/ai/document-ai) to check exported documents."]]