Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Defines the structure for content warehouse document proto.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Attributes |
|
---|---|
Name | Description |
name |
str
The resource name of the document. Format: projects/{project_number}/locations/{location}/documents/{document_id}. The name is ignored when creating a document. |
reference_id |
str
The reference ID set by customers. Must be unique per project and location. |
display_name |
str
Required. Display name of the document given by the user. This name will be displayed in the UI. Customer can populate this field with the name of the document. This differs from the 'title' field as 'title' is optional and stores the top heading in the document. |
title |
str
Title that describes the document. This can be the top heading or text that describes the document. |
display_uri |
str
Uri to display the document, for example, in the UI. |
document_schema_name |
str
The Document schema name. Format: projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}. |
plain_text |
str
Other document format, such as PPTX, XLXS This field is a member of oneof _ structured_content .
|
cloud_ai_document |
google.cloud.documentai_v1.types.Document
Document AI format to save the structured content, including OCR. This field is a member of oneof _ structured_content .
|
structured_content_uri |
str
A path linked to structured content file. |
raw_document_path |
str
Raw document file in Cloud Storage path. This field is a member of oneof _ raw_document .
|
inline_raw_document |
bytes
Raw document content. This field is a member of oneof _ raw_document .
|
properties |
MutableSequence[google.cloud.contentwarehouse_v1.types.Property]
List of values that are user supplied metadata. |
update_time |
google.protobuf.timestamp_pb2.Timestamp
Output only. The time when the document is last updated. |
create_time |
google.protobuf.timestamp_pb2.Timestamp
Output only. The time when the document is created. |
raw_document_file_type |
google.cloud.contentwarehouse_v1.types.RawDocumentFileType
This is used when DocAI was not used to load the document and parsing/ extracting is needed for the inline_raw_document. For example, if inline_raw_document is the byte representation of a PDF file, then this should be set to: RAW_DOCUMENT_FILE_TYPE_PDF. |
async_enabled |
bool
If true, makes the document visible to asynchronous policies and rules. |
content_category |
google.cloud.contentwarehouse_v1.types.ContentCategory
Indicates the category (image, audio, video etc.) of the original content. |
text_extraction_disabled |
bool
If true, text extraction will not be performed. |
text_extraction_enabled |
bool
If true, text extraction will be performed. |
creator |
str
The user who creates the document. |
updater |
str
The user who lastly updates the document. |
disposition_time |
google.protobuf.timestamp_pb2.Timestamp
Output only. If linked to a Collection with RetentionPolicy, the date when the document becomes mutable. |
legal_hold |
bool
Output only. Indicates if the document has a legal hold on it. |