Class GcsIngestPipeline (0.7.7)

GcsIngestPipeline(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The configuration of the Cloud Storage Ingestion pipeline.

Attributes

NameDescription
input_path str
The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: gs://.
schema_name str
The Document Warehouse schema resource name. All documents processed by this pipeline will use this schema. Format: projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}.
processor_type str
The Doc AI processor type name. Only used when the format of ingested files is Doc AI Document proto format.
skip_ingested_documents bool
The flag whether to skip ingested documents. If it is set to true, documents in Cloud Storage contains key "status" with value "status=ingested" in custom metadata will be skipped to ingest.
pipeline_config google.cloud.contentwarehouse_v1.types.IngestPipelineConfig
Optional. The config for the Cloud Storage Ingestion pipeline. It provides additional customization options to run the pipeline and can be skipped if it is not applicable.