Stay organized with collections
Save and categorize content based on your preferences.
A singleton resource under a Processor which configures a collection of documents.
JSON representation
{"name": string,"state": enum (State),"satisfiesPzs": boolean,"satisfiesPzi": boolean,// Union field storage_source can be only one of the following:"gcsManagedConfig": {object (GCSManagedConfig)},"documentWarehouseConfig": {object (DocumentWarehouseConfig)},"unmanagedDatasetConfig": {object (UnmanagedDatasetConfig)}// End of list of possible types for union field storage_source.// Union field indexing_source can be only one of the following:"spannerIndexingConfig": {object (SpannerIndexingConfig)}// End of list of possible types for union field indexing_source.}
Optional. User-managed Cloud Storage dataset configuration. Use this configuration if the dataset documents are stored under a user-managed Cloud Storage location.
Optional. Unmanaged dataset configuration. Use this configuration if the dataset documents are managed by the document service internally (not user-managed).
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-06-10 UTC."],[[["\u003cp\u003eThis document outlines the configuration of a Dataset, which is a singleton resource under a Processor that manages a collection of documents.\u003c/p\u003e\n"],["\u003cp\u003eDatasets can be configured to use different storage sources, including user-managed Cloud Storage via \u003ccode\u003egcsManagedConfig\u003c/code\u003e, deprecated Document AI Warehouse configurations, or internal document service management via \u003ccode\u003eunmanagedDatasetConfig\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eIndexing for datasets can be set up with \u003ccode\u003espannerIndexingConfig\u003c/code\u003e, providing a lightweight option with low latency and high reliability.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003egcsManagedConfig\u003c/code\u003e requires a \u003ccode\u003egcsPrefix\u003c/code\u003e, which defines the Cloud Storage directory where the dataset's documents are stored, specified by the \u003ccode\u003egcsUriPrefix\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eDocumentWarehouseConfig\u003c/code\u003e, although present in the resource, is deprecated and thus not supported for creating new warehouse-based dataset configurations.\u003c/p\u003e\n"]]],[],null,["# Dataset\n\nA singleton resource under a [Processor](/document-ai/docs/reference/rest/v1beta3/projects.locations.processors#Processor) which configures a collection of documents.\n\nGCSManagedConfig\n----------------\n\nConfiguration specific to the Cloud Storage-based implementation.\n\nGcsPrefix\n---------\n\nSpecifies all documents on Cloud Storage with a common prefix.\n\nDocumentWarehouseConfig\n-----------------------\n\nConfiguration specific to the Document AI Warehouse-based implementation.\n\nUnmanagedDatasetConfig\n----------------------\n\nThis type has no fields.\nConfiguration specific to an unmanaged dataset.\n\nSpannerIndexingConfig\n---------------------\n\nThis type has no fields.\nConfiguration specific to spanner-based indexing."]]