This document describes how to manage the document schemas in Document AI Warehouse, including create, fetch, list, update, and delete operations.
What are document schemas
Each document is of a certain document type and is specified by a schema.
A document schema defines the structure for a document type (for example, Invoice or Paystub) in Document AI Warehouse, where admins can specify Properties of different data types (Text | Numeric | Date | Enumeration).
Properties are used to represent the extracted data, classification tags or
other business tags appended to documents by AI or human users - for example,
Invoice_Amount
(numeric), Due_Date
(date), or Supplier_Name
(text).
Property Attributes: Each property can be declared as
Filterable - can be used to filter search results
Searchable - indexed so it can be found in search queries
Required -
required
is used to ensure the property exists in the document (We recommend saving most properties asrequired = false
, unless the property is required.)
Extensible Schema: in some cases, end users with Edit access need to add / delete new schema properties to documents. This is supported by a "MAP property", which is a list of key-value pairs.
Each key-value pair in a MAP property can be a data-type of (Text | Numeric | Date | Enumeration).
For example, Invoice may contain a Map Property "Invoice_Entities" with the following key value pairs:
Invoice_Amount (numeric) 1000
Due_Date (date) 12/24/2021
Supplier_Name (text) ABC Corp
Immutability of Schema: Note that Schema or Schema Properties can be added but currently cannot be edited or deleted, so define schema carefully.
Before you begin
Before you begin, make sure you have completed the Quickstart page.
Create a schema
Create a document schema.
REST
curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--data '{
"display_name": "Test Doc Schema",
"property_definitions": [
{
"name": "plaintiff",
"display_name": "Plaintiff",
"is_searchable": true,
"is_repeatable": true,
"text_type_options": {}
}
]
}'
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
For more information, see the Document AI Warehouse Node.js API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Get a schema
Get details of a document schema.
REST
curl --request GET --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/{document_schema_id} \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8"
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
For more information, see the Document AI Warehouse Node.js API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
List schemas
List document schemas.
REST
curl --request GET --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8"
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Delete a schema
Delete a document schema.
REST
curl --request DELETE --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/{document_schema_id} \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8"
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
For more information, see the Document AI Warehouse Node.js API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Update a schema
Update a document schema. Currently the update logic only supports adding new property definitions. The new document schema should include all property definitions present in the existing schema.
Supported:
- For existing properties, users can change the following metadata
settings:
is_repeatable
,is_metadata
,is_required
. - For existing ENUM properties, users can add new ENUM possible values or delete existing ENUM possible values. They can update the
EnumTypeOptions.validation_check_disabled
flag to disable the validation check. The validation check is used to make sure the ENUM values specified in the documents are in the range of possible ENUM values defined in the property definition when calling theCreateDocument
API. - Adding new property definitions is supported.
- For existing properties, users can change the following metadata
settings:
Not supported:
- For existing schema, updates to
display_name
anddocument_is_folder
are not allowed. - For existing properties, updates to
name
,display_name
andvalue_type_options
are not allowed.
- For existing schema, updates to
REST
curl --request PATCH --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/{document_schema_id} \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" \
--data '{
"document_schema": {
"display_name": "Test Doc Schema",
"property_definitions": [
{
"name": "plaintiff",
"display_name": "Plaintiff",
"is_repeatable": true,
"text_type_options": {}
}
]
}
}'
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Next steps
- Proceed to Manage documents to learn how to manage documents.