This guide provides all required setup steps to start using Document AI Warehouse.
About the Google Cloud console
The Google Cloud console is a web UI used to provision, configure, manage, and monitor systems that use Google Cloud products. You use the Google Cloud console to set up and manage Document AI Warehouse resources.
Create a project
To use services provided by Google Cloud, you must create a project.
A project organizes all your Google Cloud resources. A project consists of the following components:
- A set of collaborators
- Enabled APIs (and other resources)
- Monitoring tools
- Billing information
- Authentication and access controls
You can create one project, or you can create multiple projects. You can use your projects to organize your Google Cloud resources in a resource hierarchy. For more information about projects, see the Resource Manager documentation.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Enable billing
A billing account defines who pays for a given set of resources. Billing accounts can be linked to one or more projects. Project usage is charged to the linked billing account. You can configure billing when you create a project. For more information, see the Billing documentation.
Make sure that billing is enabled for your Google Cloud project.
Provision and initialize the service
Before using Document AI Warehouse for the first time, you must provision and initialize the resources associated with your project on the Document AI Warehouse Provisioning page.
If you want to provision the resources, you must be granted the Content Warehouse Admin and Service Usage Admin roles of the project.
Provisioning steps
Choose a region.
On the provisioning page, select the region that you want to enable.
Each region is independent. Therefore, if you want to use more than one region, provision each region separately.
Enable the core API. {:#enable-core-api}:
Click Enable. This enables the Document AI Warehouse APIs on your project.
After the API is enabled, click Next.
Provision the instance.
This step provisions the resource for your project in the Document AI Warehouse service. You must choose from among three access control modes. Carefully review them to select the right modes for your use case. For more information, see the access control mode page.
Select an access control (ACL) mode.
[Recommended] Document-level access control with users in Cloud Identity.
This is applicable if your organization manages users or groups in the Cloud Identity service.
- Document AI Warehouse interface supports this mode to authenticate users.
Your organization's LDAP and Active Directory users and groups can be synced to Cloud Identity.
Google Workspace users can be easily added to Cloud Identity.
Document-level access control with users in Bring-your-own Identity service access control.
If your users can't be added or synced to Cloud Identity, use this mode. However:
- The Document AI Warehouse interface doesn't support this mode; a custom client application might be needed.
- Your custom client application authenticates users against the identity provider and passes the users and group memberships using the Document AI Warehouse API.
Universal access: No document-level access control.
- The Document AI Warehouse interface supports this mode to authenticate users.
- This mode is typically used to grant access to public users without requiring authentication.
- Custom portals can access all documents by using a service account with the desired role (for example, the Document Viewer role) and can relay this access to public users without authentication.
Document-level access control with users in Cloud Identity Document-level access control with users in Bring-your-own Identity service access control Universal access Document-level access Yes Yes No Document AI Warehouse UI support Yes No Yes (if users have project-level access) Enable questioning and answering:
Check Question & Answering if you want to enable GenAI search in your project. See GenAI Search for more information, including how to get allowlisted to use the feature.
Trigger provisioning:
Click Provision to start provisioning your project. It will take a while (3-5 mins) to set up the instance.
Create a default schema.
Click Create in the initialization step. This creates a default schema that can be used for OCR-extracted PDFs or TXT files. It contains the raw text field for indexing but doesn't contain properties.
View instance:
This finalizes your provision process. If your project uses document-level access control, proceed to the next section to set up project-level permissions.
If you are in the allowlist to Google Cloud console UI features, you can click Get Started to start using Document AI Warehouse in the Google Cloud console.
If you aren't in the allowlist to Google Cloud console UI features, you can proceed to configure the web application to learn how to set up the Document AI Warehouse web application.
Configure the required permissions in IAM for your users. If document-level access control is enabled, then project-level permissions and IAM permissions are required. See required permissions for more details.
Set up project-level permissions
If your project enables document-level access control (Option 1 in ACL mode selection), you must grant your administrator account as well as your users project-level permissions.
To do that, in the final view after provisioning, go to Project Permissions:
Follow the steps below to add your admin account as a Document Admin:
Click Add User
Enter your admin's email, and choose Document Admin as the access level. Click Save.
For other users, you can add them as:
Document Admin: A role with full access to all of the documents in the project, including uploading documents and viewing/editing/deleting all documents regardless of the document owners. In addition, document admins can change the permissions of all the documents.
Document Editor: A role with viewing and editing permissions to all documents, but isn't able to create and delete documents in the project and can't change permissions of documents..
Document Viewer: A role with only the viewing permissions to all documents. Document viewers can't create, edit, delete, or change permissions of documents.
Document Creator: A role with only document uploading permissions. Document creators have full permissions to the documents they upload, but have no other permissions to any other documents unless they get explicit permissions for those documents.
The email can be either a single user email or a group email. Choose Group in the Type field when specifying a group email.
Required permissions
In Document AI Warehouse, we have an independent ACL system on top of the IAM. For document-level ACL projects, you need to get additional project-level permissions in Document AI Warehouse's ACL system. For universal access projects, only IAM permissions are required.
Here are summary tables for required permissions:
Document-ACL projects
User type | IAM role | Document AI Warehouse's project-level permissions |
---|---|---|
Admin users | Content Warehouse Admin | Document Admin |
Normal users | Content Warehouse document Schema Viewer | Document Creator/Editor/Viewer, depending on the intended permissions |
Universal access projects
User type | IAM role |
---|---|
Admin users | 1. Content Warehouse Admin 2. Content Warehouse document admin |
Normal users | 1. Content Warehouse document Schema Viewer 2. Content Warehouse document creator/viewer/editor, depending on the intended permissions |
IAM roles for universal access projects
Role Title | Role name | Purpose |
---|---|---|
Content Warehouse document creator | contentwarehouse.documentCreator |
Creating documents |
Content Warehouse document viewer | contentwarehouse.documentViewer |
Viewing any documents |
Content Warehouse document editor | contentwarehouse.documentEditor |
Editing any documents (does not include creating and deleting) |
Content Warehouse document admin | contentwarehouse.documentAdmin |
Managing any documents (including creating and deleting) |
Content Warehouse Admin | contentwarehouse.admin |
Managing any documents as well as schemas and rules |
See IAM roles and permissions for further details.
Set up the access token (for calling the API from the command line)
To call the Document AI Warehouse API with command line tools, follow these steps.
Use the service account key file in your environment
Provide authentication credentials to your application code by setting the
environment variable GOOGLE_APPLICATION_CREDENTIALS
. This
variable applies only to your current shell session. If you want the variable
to apply to future shell sessions, set the variable in your shell startup file,
for example in the ~/.bashrc
or ~/.profile
file.
Linux or macOS
export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH
"
Replace KEY_PATH
with the path of the JSON file that contains your credentials.
For example:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
Windows
For PowerShell:
$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH
"
Replace KEY_PATH
with the path of the JSON file that contains your credentials.
For example:
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"
For command prompt:
set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH
Replace KEY_PATH
with the path of the JSON file that contains your credentials.
Install and initialize the Google Cloud CLI (optional)
The gcloud CLI provides a set of tools that you can use to manage resources and applications hosted on Google Cloud.
The following link provides instructions:
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
Generate the access token
If you have set up authentication in previous steps, you can use the Google Cloud CLI to test your authentication environment. Execute the following command and verify that no error occurs and that credentials are returned:
AUTH_TOKEN=$(gcloud auth application-default print-access-token --scopes=https://www.googleapis.com/auth/cloud-platform)
Expect that the AUTH_TOKEN
is set, for example:
$ echo $AUTH_TOKEN
ya29.c.b0AXv0zTPvXmEMZXCe781qL0Y3r1EKnw3k4DJcoWGZkyWKx-nMNVQVErQ3ge6XA2RXsTU1tf_SMLgeWC6xwS51tP8QZhbypuGczBzMgKWYExwATHt3Vn553edl8tmqCMjROgdQjCDd8i7as-236r4d8gNwKsR192gNgNw_0zzs0MPyNVmqydpfmpj8yBwJI5QWna1331GTGKgd3Ia16fTzAHrZC_GkcO0wJPo....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Test the calling the Document AI Warehouse API
The AUTH_TOKEN
is used by all Document AI Warehouse API REST samples to authenticate
API calls. For example, the following command retrieves all the document schemas
you defined that are associated with your project (for most cases, use "us" as
the location):
curl --header "Authorization: Bearer $AUTH_TOKEN" https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER>/locations/LOCATION/documentSchemas
Code Samples
Java
For more information, see the Document AI Warehouse Java API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
For more information, see the Document AI Warehouse Node.js API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
For more information, see the Document AI Warehouse Python API reference documentation.
To authenticate to Document AI Warehouse, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Next steps
If you use the web application (preview), proceed to Administer the Document AI Warehouse UI to setup UI for Document AI Warehouse.
Proceed to Manage access control to understand and configure the access control.
Proceed to Manage document schemas to learn how to manage the document schemas.
Proceed to Search documents to learn how to search documents.