This document describes how to manage the documents in Document AI Warehouse, including create, fetch, update, and delete operations.
What are documents
A document is the data model used in Document AI Warehouse to organize a real-world document (for example, PDF or TXT) and its associated properties. You interact with Document AI Warehouse by operations on the documents.
File types supported
While the focus of Document AI Warehouse is documents, it's also used to manage associated images (for example, in verticals such as insurance, engineering, construction, and research).
- The Ingest API supports PDFs and TIFF, JPEG, and PNG images, along with any properties or pre-extracted text.
- The Upload UI supports the extraction of PDFs using Document AI OCR and custom processors.
- The Viewer UI supports rendering in PDF, text, and Microsoft Office files.
Before you begin
Before you begin, make sure you have completed the Quickstart page.
For document creation, if your data resides in your own Cloud Storage bucket, you have to give the Document AI Warehouse service account storage object viewer permission to read your data.
Each document is specified by a schema and belongs to a document type. A document schema defines the document structure in Document AI Warehouse. Before you can create documents, you need to create a document schema.
Create a document
To create a document, you need to provide raw document content to
Document AI Warehouse. The two ways to provide raw document byte content are by
setting either Document.inline_raw_document
or Document.raw_document_path
.
The differences are as follows:
Document.raw_document_path
: This is the preferred approach. It uses the Cloud Storage path (gs://bucket/object) of the file to be ingested. Note that the caller needs to have read permission on this object for the call to succeed.Document.inline_raw_document
: byte/text representation of the file, provided directly to the endpoint.
To create a document, do the following:
Upload a document from Cloud Storage
You need to grant the Document AI Warehouse service account access to your Cloud Storage bucket as described in the prerequisite section.
You need to upload your file onto a Cloud Storage bucket, following the instructions.
REST
Request:
curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
"document": {
"display_name": "TestDoc3",
"document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/DOCUMENT_SCHEMA_ID",
"raw_document_path": "gs://BUCKET_URI/FILE_URI",
"properties": [
{
"name": "supplier_name",
"text_values": {
"values": "Stanford Plumbing & Heating"
}
},
{
"name": "total_amount",
"float_values": {
"values": "1091.81"
}
},
]
},
"requestMetadata":{
"userInfo":{
"id": "user:USER_EMAIL_ID"
}
}
}'
Upload from a local machine
REST
Request:
curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/ \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
"document": {
"display_name": "TestDoc3",
"document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/DOCUMENT_SCHEMA_ID",
"inline_raw_document": "<bytes>",
"properties": [
{
"name": "supplier_name",
"text_values": {
"values": "Stanford Plumbing & Heating"
}
},
{
"name": "total_amount",
"float_values": {
"values": "1091.81"
}
},
]
},
"requestMetadata": {
"userInfo": {
"id": "user:USER_EMAIL_ID"
}
}
}'
Get a document
By document_id
:
REST
curl --request POST \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" -d '{
"requestMetadata":{
"userInfo":{
"id": "user:USER_EMAIL"
}
}
}' \
"https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT_ID:get"
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
For more information, see the Document AI Warehouse Node.js API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
By reference_id
:
curl --request POST \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" -d '{
"requestMetadata":{
"userInfo":{
"id": "user:USER_EMAIL"
}
}
}' \
"https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/referenceId/REFERENCE_ID:get"
Update a document
By document_id
:
REST
posix-terminal
curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
"document": {
"display_name": "TestDoc3",
"document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/DOCUMENT_SCHEMA_ID",
"raw_document_path": "gs://BUCKET_URI/FILE_URI",
"properties": [
{
"name": "supplier_name",
"text_values": {
"values": "Stanford Plumbing & Heating"
}
},
{
"name": "total_amount",
"float_values": {
"values": "1091.81"
}
},
{
"name": "invoice_id",
"text_values": {
"values": "invoiceid"
}
},
]
},
"requestMetadata": {
"userInfo": {
"id": "user:USER_EMAIL"
}
}
}'
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
By reference_id
:
curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
"document": {
"display_name": "TestDoc3",
"document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/referenceId/REFERENCE_ID",
"raw_document_path": "gs://BUCKET_URI/FILE_URI",
"properties": [
{
"name": "supplier_name",
"text_values": {
"values": "Stanford Plumbing & Heating"
}
},
{
"name": "total_amount",
"float_values": {
"values": "1091.81"
}
},
{
"name": "invoice_id",
"text_values": {
"values": "invoiceid"
}
},
]
},
"requestMetadata": {
"userInfo": {
"id": "user:USER_EMAIL"
}
}
}'
Delete a document
REST
By document_id
:
curl --request POST \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" -d '{
"requestMetadata":{
"userInfo":{
"id": "user:USER_EMAIL"
}
}
}' \
"https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT_ID:delete"
By reference_id
:
curl --request POST \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" -d '{
"requestMetadata":{
"userInfo":{
"id": "user:USER_EMAIL"
}
}
}' \
"https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/referenceId/REFERENCE_ID":delete"
Next steps
- Proceed to Organize documents in folders to learn how to organize documents in folders.