Dataset

A singleton resource under a Processor which configures a collection of documents.

JSON representation

JSON representation
{ "name": string, "state": enum (`State`), "satisfiesPzs": boolean, "satisfiesPzi": boolean, // Union field `storage_source` can be only one of the following: "gcsManagedConfig": { object (`GCSManagedConfig`) }, "documentWarehouseConfig": { object (`DocumentWarehouseConfig`) }, "unmanagedDatasetConfig": { object (`UnmanagedDatasetConfig`) } // End of list of possible types for union field `storage_source`. // Union field `indexing_source` can be only one of the following: "spannerIndexingConfig": { object (`SpannerIndexingConfig`) } // End of list of possible types for union field `indexing_source`. }

{
  "name": string,
  "state": enum (State),
  "satisfiesPzs": boolean,
  "satisfiesPzi": boolean,

  // Union field storage_source can be only one of the following:
  "gcsManagedConfig": {
    object (GCSManagedConfig)
  },
  "documentWarehouseConfig": {
    object (DocumentWarehouseConfig)
  },
  "unmanagedDatasetConfig": {
    object (UnmanagedDatasetConfig)
  }
  // End of list of possible types for union field storage_source.

  // Union field indexing_source can be only one of the following:
  "spannerIndexingConfig": {
    object (SpannerIndexingConfig)
  }
  // End of list of possible types for union field indexing_source.
}

Fields

Fields
`name`	`string` Dataset resource name. Format: `projects/{project}/locations/{location}/processors/{processor}/dataset`
`state`	`enum (State)` Required. State of the dataset. Ignored when updating dataset.
`satisfiesPzs`	`boolean` Output only. Reserved for future use.
`satisfiesPzi`	`boolean` Output only. Reserved for future use.
Union field `storage_source`. `storage_source` can be only one of the following:
`gcsManagedConfig`	`object (GCSManagedConfig)` Optional. User-managed Cloud Storage dataset configuration. Use this configuration if the dataset documents are stored under a user-managed Cloud Storage location.
`documentWarehouseConfig (deprecated)`	`object (DocumentWarehouseConfig)` This item is deprecated! Optional. Deprecated. Warehouse-based dataset configuration is not supported.
`unmanagedDatasetConfig`	`object (UnmanagedDatasetConfig)` Optional. Unmanaged dataset configuration. Use this configuration if the dataset documents are managed by the document service internally (not user-managed).
Union field `indexing_source`. `indexing_source` can be only one of the following:
`spannerIndexingConfig`	`object (SpannerIndexingConfig)` Optional. A lightweight indexing source with low latency and high reliability, but lacking advanced features like CMEK and content-based search.

name

string

Dataset resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset

state

enum (State)

Required. State of the dataset. Ignored when updating dataset.

satisfiesPzs

boolean

Output only. Reserved for future use.

satisfiesPzi

boolean

Output only. Reserved for future use.

Union field storage_source.

storage_source can be only one of the following:

gcsManagedConfig

object (GCSManagedConfig)

Optional. User-managed Cloud Storage dataset configuration. Use this configuration if the dataset documents are stored under a user-managed Cloud Storage location.

documentWarehouseConfig
(deprecated)

object (DocumentWarehouseConfig)

Optional. Deprecated. Warehouse-based dataset configuration is not supported.

unmanagedDatasetConfig

object (UnmanagedDatasetConfig)

Optional. Unmanaged dataset configuration. Use this configuration if the dataset documents are managed by the document service internally (not user-managed).

Union field indexing_source.

indexing_source can be only one of the following:

spannerIndexingConfig

object (SpannerIndexingConfig)

Optional. A lightweight indexing source with low latency and high reliability, but lacking advanced features like CMEK and content-based search.

GCSManagedConfig

Configuration specific to the Cloud Storage-based implementation.

JSON representation
{ "gcsPrefix": { object (`GcsPrefix`) } }

Fields

Fields
`gcsPrefix`	`object (GcsPrefix)` Required. The Cloud Storage URI (a directory) where the documents belonging to the dataset must be stored.

gcsPrefix

object (GcsPrefix)

Required. The Cloud Storage URI (a directory) where the documents belonging to the dataset must be stored.

GcsPrefix

Specifies all documents on Cloud Storage with a common prefix.

JSON representation
{ "gcsUriPrefix": string }

Fields

Fields
`gcsUriPrefix`	`string` The URI prefix.

gcsUriPrefix

string

The URI prefix.

DocumentWarehouseConfig

Configuration specific to the Document AI Warehouse-based implementation.

JSON representation
{ "collection": string, "schema": string }

Fields

Fields
`collection`	`string` Output only. The collection in Document AI Warehouse associated with the dataset.
`schema`	`string` Output only. The schema in Document AI Warehouse associated with the dataset.

collection

string

Output only. The collection in Document AI Warehouse associated with the dataset.

schema

string

Output only. The schema in Document AI Warehouse associated with the dataset.

UnmanagedDatasetConfig

This type has no fields.

Configuration specific to an unmanaged dataset.

SpannerIndexingConfig

This type has no fields.

Configuration specific to spanner-based indexing.