Module document (0.14.2a0)

Wrappers for Document AI Document type.



    shards: typing.List[],
    gcs_bucket_name: typing.Optional[str] = None,
    gcs_prefix: typing.Optional[str] = None,
    gcs_uri: typing.Optional[str] = None,
    gcs_input_uri: typing.Optional[str] = None,

Represents a wrapped Document.

This class hides away the complexities of using the Document protobuf response outputted by BatchProcessDocuments or ProcessDocument methods and implements convenient methods for searching and extracting information within the Document.

Modules Functions


    documentai_object: typing.Union[typing.Dict[str, typing.Dict], typing.List],
    text_offset: int,
) -> None

Applies a text offset to all text_segments in documentai_object.

Name Description
documentai_object object

Required. Document AI object to apply text_offset to.

text_offset int

Required. Text offset to apply. From Document.shard_info.text_offset.


_bigquery_column_name(input_string: str) -> str

Converts a string into a BigQuery column name.

Name Description
input_string str

Required: The string to convert.


    dic: typing.Dict[str, typing.Union[str, typing.List[str]]],
    dataset_name: str,
    table_name: str,
    project_id: typing.Optional[str],
) ->

Loads dictionary to a BigQuery table.

Name Description
dic Dict[str, Union[str, List[str]]]

Required: The dictionary to insert.

dataset_name str

Required. Name of the BigQuery dataset.

table_name str

Required. Name of the BigQuery table.

project_id Optional[str]

Optional. Project ID containing the BigQuery table. If not passed, falls back to the default inferred from the environment.

Type Description
bigquery.job.LoadJob The BigQuery LoadJob for adding the dictionary.


    shards: typing.List[],
) -> typing.List[]

Returns a list of Entities and Properties from a list of documentai.Document shards.

Name Description
shards List[]

Required. List of document shards.

Type Description
List[Entity] a list of Entities.


    operation_name: str,
    location: typing.Optional[str] = None,
    timeout: typing.Optional[float] = None,
) ->

Get BatchProcessMetadata from a batch_process_documents() long-running operation.

Name Description
operation_name str

Required. The fully qualified operation name for a batch_process_documents() operation.

location str

Optional. The location of the processor used for batch_process_documents(). Deprecated. Maintained for backwards compatibility.

timeout float

Optional. Default None. Time in seconds to wait for operation to complete. If None, will wait indefinitely.

Type Description
documentai.BatchProcessMetadata Metadata from batch process.


    gcs_bucket_name: str, gcs_prefix: str
) -> typing.List[]

Returns a list of documentai.Document shards from a Cloud Storage folder.

Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.

Type Description
List[] A list of documentai.Documents.


    dic: typing.Dict[str, typing.Union[str, typing.List[str]]], key: str, value: str
) -> typing.Dict[str, typing.Union[str, typing.List[str]]]

Inserts value into a dictionary that can contain lists.

Name Description
dic Dict[str, Union[str, List[str]]]

Required. The dictionary to insert into.

key str

Required. The key to be created or inserted into.

value str

Required. The value to be inserted.

Type Description
Dict[str, Union[str, List[str]]] The dictionary after adding the key-value pair.


    shards: typing.List[],
) -> typing.List[]

Returns a list of Pages from a list of documentai.Document shards.

Name Description
shards List[]

Required. List of document shards.

Type Description
List[Page] A list of Pages.