This page gives a brief overview of folders and explains how to manage documents using folders.
Policy Engine and rules
In Document Warehouse, Policy Engine let users to define and execute common operations on documents (for example, validate or update) while creating or updating documents.
Rules and RuleSet
A Rule at a high level refers to a user-defined configuration that specifies the following:
- what triggers the rules checking,
- which condition is evaluated, and
- what actions are run when the condition is satisfied.
Along with these specifications, a rule includes information of the description, source, target and triggering condition.
A logical collection of rules is called a RuleSet. For example, Rules operating on the same schema can be grouped together into a single RuleSet. Customers can define multiple RuleSets.
Rules are useful for automatically triggering predefined actions, while creating or updating documents.
A Rule consists of three main things:
- TriggerType: Event on which the rule check should be initiated. Create and Update are the supported trigger types.
- Rule Condition: The condition that is evaluated after a certain trigger type is detected. Conditions can be expressed using Common Expression Language (CEL). Each condition should evaluate to boolean output.
- Actions: Set of steps that are executed when the rule is satisfied. When a rule condition is evaluated as true, then the corresponding action (configured in the rule) is executed. The following are the high-level details about specific actions implemented in Document Warehouse:
- Data Validation action: Action that enables validating specific fields in the document during document creation or update.
- Data Update action: Action that enables updating specific fields in the document during document creation or update. Such updates are run when the rule condition is satisfied.
- Delete Document Action: Action that enables deleting the document during document update when certain fields meet the deletion criteria defined using rule conditions.
- Folder inclusion action: Action that automatically adds a new document (or updated document) under specific folders. Such folders can be specified directly using their name.
- Remove from folder action: Action that automatically removes a new document from given folders when a rule-level condition is satisfied.
- Access Control Action: Action that allows updating the access control lists (groups and user bindings) during document creation. Such updates are run when the rule condition is satisfied.
- Publish action: Action that publishes specific messages to the user's Pub/Sub channel when a rule-level condition is satisfied.
Manage RuleSets
Document Warehouse provides APIs to manage RuleSets (Create, Get, Update, Delete, List). This section provides samples for configuring different types for rules.
Create a RuleSet
To create a rule-set, do the following:
REST
Request:
# Create a RuleSet for data validation.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"trigger_type": "ON_CREATE",
"condition": "documentType == \'W9\' && STATE ==\'CA\'",
"actions": {
"data_validation": {
"conditions": {
"NAME": "NAME != \'\'",
"FILING_COST": "FILING_COST > 10.0"
}
}
},
"enabled": true
}
],
"description": "W9: Basic validation check rules."
}'
Response
{
"description": "W9: Basic validation check rules.",
"name": "RULE_SET_NAME",
"rules": [
{
"actions": [
{
"actionId": "de0e6b84-106b-44ba-b1c4-0b3ad6ddc719",
"dataValidation": {
"conditions": {
"FILING_COST": "FILING_COST > 10.0",
"NAME": "NAME != ''"
}
}
}
],
"condition": "documentType == 'W9' && STATE =='CA'",
"enabled": true,
"triggerType": "ON_CREATE"
}
]
}
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
List RuleSets
To list rule-sets under a project, do the following:
REST
Request:
# List all rule-sets for a project.
curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets
Response
{
"ruleSets": [
{
"description": "W9: Basic validation check rules.",
"rules": [
{
"triggerType": "ON_CREATE",
"condition": "documentType == 'W9' && STATE =='CA'",
"actions": [
{
"actionId": "fcf79ae8-9a1f-4462-9262-eb2e7161350c",
"dataValidation": {
"conditions": {
"NAME": "NAME != ''",
"FILING_COST": "FILING_COST > 10.0"
}
}
}
],
"enabled": true
}
],
"name": "RULE_SET_NAME"
}
]
}
Get a RuleSet
To get a rule-set using the rule-set name, do the following:
REST
Request:
# Get a rule-set using rule-set ID.
curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets/RULE_SET
Response
{
"description": "W9: Basic validation check rules.",
"rules": [
{
"triggerType": "ON_CREATE",
"condition": "documentType == 'W9' && STATE =='CA'",
"actions": [
{
"actionId": "7559346b-ec9f-4143-ab1c-1912f5588807",
"dataValidation": {
"conditions": {
"NAME": "NAME != ''",
"FILING_COST": "FILING_COST > 10.0"
}
}
}
],
"enabled": true
}
],
"name": "RULE_SET_NAME"
}
Delete a RuleSet
To delete a rule-set using the rule-set name, do the following:
REST
Request:
# Get a rule-set using rule-set ID.
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets/RULE_SET
Rule Actions
This section will go over the rule expressions and each rule action examples.
Sample conditions
A condition refers to the expression specified using Common Expression Language.
Examples:
- String field expression
STATE == \'CA\'
. Check if the value ofSTATE
field is equal toCA
NAME != \'\'
. Check that the value of theNAME
field is not empty.
- Numeric field expression
FILING_COST > 10.0
. Check if the value of theFILING_COST
field (defined as float) is greater than10.0
.
How to check if a document belongs to a specific schema
To refer to a specific schema type, use the special field name documentType
(it is a reserved word). It is evaluated against the DisplayName
field in the DocumentSchema
.
Example:
documentType == \'W9\'
The preceding condition checks if the schema of the document (using the keyword documentType
) has a display name of W9
.
How to refer to old/existing document property values and new document property values
To support conditions that include existing and newly given properties, use the following two prefixes with a DOT operator to access the specific version of the property:
OLD_
to refer to existing document properties.NEW_
to refer to new document properties in the request.
Example:
OLD_.state == \'TX\' && NEW_.state == \'CA\'
Checks that the existing value of the state property isTX
and the new value given isCA
.
Date fields handling
For the DriverLicense
Document, if the EXPIRATION_DATE
is before a certain date
- Update (or add new if absent)
EXPIRATION_STATUS
(enum field) with a value equal toEXPIRING_BEFORE_CLOSING_DATE
.
To add date values use the timestamp function as shown in the following example.
REST
Request:
# Check if document expires before a date and update the status field
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules":[
{
"trigger_type": "ON_CREATE",
"description": "Expiration date check rule",
"condition": "documentType==\'DriverLicense\' && EXPIRATION_DATE < timestamp(\'2021-08-01T00:00:00Z\')",
"actions": {
"data_update": {
"entries": {
"EXPIRATION_STATUS": "EXPIRING_BEFORE_CLOSING_DATE"
}
}
}
}
]
}'
Data validation rule
Validate a W9
document for the STATE
(text field) California:
- Check that the
NAME
(text field) is non-empty. Check that the
FILING_COST
(float field) is greater than10.0
.
REST
Request:
# Rules for data validation.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"trigger_type": "ON_CREATE",
"condition": "documentType == \'W9\' && STATE ==\'CA\'",
"actions": {
"data_validation": {
"conditions": {
"NAME": "NAME != \'\'",
"FILING_COST": "FILING_COST > 10.0"
}
}
},
"enabled": true
}
],
"description": "W9: Basic validation check rules."
}'
Data update rule
For a W9
document, if the BUSINESS_NAME
field is Google:
- Update (or add new if absent) an
Address
field equal to1600 Amphitheatre Pkwy
. Update (or add new if absent) an
EIN
field equal to77666666
.
REST
Request:
# Rule for data update.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules":[
{
"description": "W9: Rule to update address data and EIN.",
"trigger_type": "ON_CREATE",
"condition": "documentType==\'W9\' && BUSINESS_NAME == \'Google\'",
"actions": {
"data_update": {
"entries": {
"Address": "1600 Amphitheatre Pkwy",
"EIN": "776666666"
}
}
}
}
]
}'
Document delete rule
While updating the W9
document, if the BUSINESS_NAME
field is changed to Google
then delete the document.
REST
Request:
# Rule for deleting the document
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"description": "W9: Rule to delete the document during update.",
"trigger_type": "ON_UPDATE",
"condition": "documentType == \'W9\' && BUSINESS_NAME == \'Google\'",
"actions": {
"delete_document_action": {
"enable_hard_delete": true
}
}
}
]
}'
Access control rule
While updating the W9
document, if the BUSINESS_NAME
field is Google
then update policy bindings that control access to document
Add new binding
When a document satisfies the rule condition:
- Adds the Editor role for
user:a@example.com
andgroup:xxx@example.com
Adds the Viewer role for
user:b@example.com
andgroup:yyy@example.com
REST
Request:
# Rule for adding new policy binding while creating the document.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"description": "W9: Rule to add new policy binding."
"trigger_type": "ON_CREATE",
"condition": "documentType == \'aca13aa9-6d0d-4b6b-a1eb-315dcb876bd1\' && BUSINESS_NAME == \'Google\'",
"actions": {
"access_control": {
"operation_type": "ADD_POLICY_BINDING",
"policy": {
"bindings": [
{
"role": "roles/contentwarehouse.documentEditor",
"members": ["user:a@example.com", "group:xxx@example.com"]
},
{
"role": "roles/contentwarehouse.documentViewer",
"members": ["user:b@example.com", "group:yyy@example.com"]
}
]
}
}
}
}
]
}'
Replace an existing binding
When a document satisfies the rule condition, replace the existing binding to include only the Editor role for user:a@example.com
and group:xxx@example.com
.
REST
Request:
# Rule for replacing existing policy bindings with newly given bindings.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"description": "W9: Rule to replace policy binding."
"trigger_type": "ON_CREATE",
"condition": "documentType == \'a9e37d07-9cfa-4b4d-b372-53162e3b8bd9\' && BUSINESS_NAME == \'Google\'",
"actions": {
"access_control": {
"operation_type": "REPLACE_POLICY_BINDING",
"policy": {
"bindings": [
{
"role": "roles/contentwarehouse.documentEditor",
"members": ["user:a@example.com", "group:xxx@example.com"]
}
]
}
}
}
}
]
}'
Add to folder rule
When a folder is created or updated, it can be added under predefined static folders or folders matching certain search criteria.
Configure static folders
When a new DriverLicense
is created, then add it under the already created folder.
REST
Request:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"trigger_type": "ON_CREATE",
"condition": "documentType == \'DriverLicense\'",
"actions": {
"add_to_folder": {
"folders": ["projects/821411934445/locations/us/documents/445en119hqp70"]
}
}
}
]
}'
Publish to Pub/Sub
When a document is created or updated, or a link is created or deleted, you can push a notification message to the Pub/Sub channel.
Steps to use
- Create a Pub/Sub topic in Customer project.
- Create a rule to trigger publish Pub/Sub action using the following request. (See the following example.)
- Invoke Document AI Warehouse APIs.
- Verify that messages are published on Pub/Sub channel.
Example rule
When a document is added under a folder (CreateLink API is invoked), the following rule can be used to send notification messages to Pub/Sub topic.
REST
Request:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/ruleSets \
-d '{
"rules": [
{
"trigger_type": "ON_CREATE_LINK",
"condition": "documentType == \'DriverLicenseFolder\'",
"actions": {
"publish_to_pub_sub": {
"topic_id": "<topic_name>"
"messages": "Added document under a folder."
}
}
}
]
}'
Rule details
This action is supported for the following trigger types:
ON_CRATE
: When new document is created.ON_UPDATE
: When document is updated.ON_CRATE_LINK
: When a new link is created.ON_DELETE_LINK
: When a link is deleted.
For Create and Update Document triggers, the condition can include attributes of the document being created or updated.
For Create and Delete Link triggers, the condition can only include attributes of the Folder document from which the document is being added or removed.
The
messages
field can be used to send list of messages to Pub/Sub channel. Note that along with these messages, by default, the following fields are also published:- Schema name, Document name, Trigger type, RuleSet name, Rule id, Action id.
- For Create and Delete link triggers, the notifications include relevant link information that is being added or deleted.