- 3.0.1 (latest)
- 3.0.0
- 2.35.0
- 2.34.0
- 2.33.0
- 2.32.0
- 2.30.0
- 2.29.3
- 2.28.0
- 2.27.1
- 2.26.0
- 2.25.0
- 2.24.2
- 2.23.0
- 2.22.0
- 2.21.1
- 2.20.2
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.1
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.1
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.1
- 2.3.0
- 2.2.0
- 2.1.0
- 2.0.3
- 1.5.1
- 1.4.2
- 1.3.0
- 1.2.1
- 1.1.0
- 1.0.0
- 0.5.2
- 0.4.0
- 0.3.0
- 0.2.0
- 0.1.0
Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Attributes |
|
---|---|
Name | Description |
uri |
str
Optional. Currently supports Google Cloud Storage URI of the form gs://bucket_name/object_name . Object versioning is
not supported. For more information, refer to `Google Cloud
Storage Request
URIs |
content |
bytes
Optional. Inline document content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers
use a pure binary representation, whereas JSON
representations use base64.
This field is a member of oneof _ source .
|
mime_type |
str
An IANA published `media type (MIME type) |
text |
str
Optional. UTF-8 encoded text in reading order from the document. |
text_styles |
MutableSequence[google.cloud.documentai_v1.types.Document.Style]
Styles for the Document.text. |
pages |
MutableSequence[google.cloud.documentai_v1.types.Document.Page]
Visual page layout for the Document. |
entities |
MutableSequence[google.cloud.documentai_v1.types.Document.Entity]
A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries. |
entity_relations |
MutableSequence[google.cloud.documentai_v1.types.Document.EntityRelation]
Placeholder. Relationship among Document.entities. |
text_changes |
MutableSequence[google.cloud.documentai_v1.types.Document.TextChange]
Placeholder. A list of text corrections made to Document.text. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other. |
shard_info |
google.cloud.documentai_v1.types.Document.ShardInfo
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified. |
error |
google.rpc.status_pb2.Status
Any error that occurred while processing this document. |
revisions |
MutableSequence[google.cloud.documentai_v1.types.Document.Revision]
Placeholder. Revision history of this document. |
document_layout |
google.cloud.documentai_v1.types.Document.DocumentLayout
Parsed layout of the document. |
chunked_document |
google.cloud.documentai_v1.types.Document.ChunkedDocument
Document chunked based on chunking config. |
Classes
ChunkedDocument
ChunkedDocument(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Represents the chunks that the document is divided into.
DocumentLayout
DocumentLayout(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Represents the parsed layout of a document as a collection of blocks that the document is divided into.
Entity
Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)
An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.
EntityRelation
EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Relationship between Entities.
Page
Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A page in a Document.
PageAnchor
PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.
Provenance
Provenance(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Structure to identify provenance relationships between annotations in different revisions.
Revision
Revision(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Contains past or forward revisions of this document.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
ShardInfo
ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Style
Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
TextAnchor
TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Text reference indexing into the Document.text.
TextChange
TextChange(mapping=None, *, ignore_unknown_fields=False, **kwargs)
This message is used for text changes aka. OCR corrections.