Security, CMEK, and audit-logs
Security overview
In order to ensure service security in Document AI, read the following questions applicable in these scenarios.
How does Google protect and ensure the security of the data I send to Document AI?
Refer to the Google Cloud Security page which describes the security measures in place for Google Cloud Services.
What security horizontals does Document AI support?
Document AI supports the following:
- Data Residency
- VPC Service Controls (VPC-SC)
- Access Transparency
- Customer-managed encryption keys (CMEK)
Security compliance
This section describes the questions related to compliance.
What compliance does Document AI offer?
Google Cloud undergoes regular independent third-party audits to verify alignment with security, privacy, and compliance controls. Google Cloud has regular audits for standards such as ISO 27001, ISO 27017, ISO 27018, SOC 2, SOC 3, and PCI DSS.
You can read more about Google Cloud compliance on the Compliance resource center
Is Document AI FedRAMP compliant?
Document AI is FedRAMP Moderate compliant.
Is Document AI HIPAA compliant?
Document AI is HIPAA compliant.
Security data usage
This section describes data inquiries.
Does Google use customer data to improve the model(s)?
No. Google does not use any of your content (such as documents and predictions) for any purpose except to provide you with the Document AI service. See the Document AI data usage policy.
At Google Cloud, we never use customer data to train our Document AI models.
For more information, see this blog post: Sharing our data privacy commitments for the AI era
In the future, will Google share the document I send to Document AI?
We won't make the document that you send available to the public, or share it with anyone else, except as necessary to provide the Document AI service. For example, sometimes we may need to use a third-party vendor to help us provide some aspect of our services, such as storage or transmission of data. Our vendors are under appropriate security and confidentiality contractual obligations. We don't share documents you send with other parties or make them public for any other purpose.
Will documents I send to Document AI, their results, or other information about requests, be stored on Google servers? If so, how long and where, and can I access it?
When you send a document to Document AI using a batch request, we must store that document (encrypted with an ephemeral key, meaning that no human has access to it) for a short period of time in order to perform the analysis and return the results to you. For batch operations, the stored document is typically deleted immediately after the processing, with a failsafe Time to live (TTL) of one day. If the batch abends abnormally, the data may persist with a TTL of up to seven days.
Synchronous processes
For online (immediate response) operations, the document data (sent in the request) is processed in memory, encrypted in flight, and not persisted to disk. Google also temporarily logs some metadata about your Document AI API requests (such as the time the request was received and the size of the request) to improve our service and combat abuse.
For more information, see:
Does Google claim ownership of the content I send in the request to Document AI
Google does not claim any ownership in any of the content (including documents and predictions) that you transmit to Document AI. Documents and custom models are considered to be (private) customer data. We never use customer data to improve our models. In the rare circumstance where both parties agree to such an arrangement, an explicit data sharing agreement is crafted.
What is considered Personally Identifiable Information (PII) that needs to be redacted on documents before being shared with Google?
For document sharing purposes, PII is any information defined as personal identifiable data under applicable laws. Customers must redact the documents prior to sharing them with Google, for example when voluntarily done for technical support purposes to reproduce a problem.
Examples of PII include but are not limited to:
- Date of birth, for example:
2/10/1988
- Names of individuals, for example:
Kiran Darko
- Personal address, for example:
Evergreen terrace 123
- Email address of individuals, for example:
rivelro@test-mail.com
- Telephone number of individuals, for example:
636-555-3226
- Driver's license number
- National ID number
- Employer identification number
- Bank account information: account IDs, routing numbers, SWIFT IDs
- Payment card number
- Gender, for example:
Female, Male, Nonbinary
- Ethnicity, for example:
Berber, Italian, Japanese, Latino, Ukrainian
- Usernames, ID number of third parties
- Passport number, for example:
AA1001111
- Marital status, for example:
Single, Divorced
- Number of allowances or exemptions
- Dependent names
- Vehicle identifiers (VIN, license plates, etc.)
- Any other unique identifying number, characteristic or code of an individual that could identify an individual consumer, family, or device over time or across services
Can I resell the Document AI API?
No, you are not permitted to resell Document AI service. You can still integrate Document AI into applications of independent value.
How can customers control Google Cloud} support access to their documents or data?
All Document AI parsers support access transparency and access approvals. By default, Google support wouldn't have access to any of customer data or applications. In the situation where access is required from the Google support team, customers can use the Access Approvals process to authorize access to data or applications. This process starts with the creation of a ticket in the Google support portal. The customer then receive a notification (usually email) and an option to authorize or deny access.
Google also offers a service called Access Transparency which gives customer visibility into all the tasks that Google support performs while they have access to the system.
CMEK overview
By default, Google Cloud automatically encrypts data when it is at rest using encryption keys managed by Google.
If you have specific compliance or regulatory requirements related to the keys that protect your data, you can use customer-managed encryption keys (CMEK) for Document AI. Instead of Google managing the encryption keys that protect your data, your Document AI processor is protected using a key that you control and manage in Cloud Key Management Service (KMS).
This guide describes CMEK for Document AI. For more information about CMEK in general, including when and why to enable it, see the Cloud Key Management Service documentation.
Using CMEK
Encryption settings are available when you create a processor. To use CMEK, select the CMEK option and select a key.
The CMEK key is used for all data associated with the processor and its child resources. All customer-related data that is sent to the processor is automatically encrypted with the provided key before writing to disk.
Once a processor has been created, you cannot change its encryption settings. To use a different key, you must create a new processor.
External keys
You can use Cloud External Key Manager (EKM) to create and manage external keys to encrypt data within Google Cloud.
When you use a Cloud EKM key, Google has no control over the availability of your externally managed key. If you request access to a resource encrypted with an externally managed key, and the key is unavailable, then Document AI will reject the request. There can be a delay of up to 10 minutes before you can access the resource after the key becomes available.
For more considerations when using external keys, see EKM considerations.
CMEK supported resources
When storing any resource to disk, if any customer data is stored as part of the resource, Document AI first encrypts the contents using the CMEK key.
Resource | Material Encrypted |
---|---|
Processor |
N/A - no user data. However, if you specify a CMEK key during processor creation then it must be valid. |
ProcessorVersion |
All |
Evaluation |
All |
CMEK supported APIs
The APIs that use the CMEK key for encryption include the following:
Method | Encryption |
---|---|
processDocument |
N/A - no data saved to disk. |
batchProcessDocuments |
Data is temporarily stored on disk and encrypted using an ephemeral key (see CMEK compliance). |
reviewDocument |
Documents pending review are stored in a Cloud Storage bucket encrypted using the provided KMS/CMEK key. |
trainProcessorVersion |
Documents used for training are encrypted using the provided KMS/CMEK key. |
evaluateProcessorVersion |
Evaluations are encrypted using the provided KMS/CMEK key. |
API requests that access encrypted resources fail if the key is disabled or is unreachable. Examples include the following:
Method | Decryption |
---|---|
getProcessorVersion |
Processor versions trained using customer data are encrypted. Access requires decryption. |
processDocument |
Processing documents using an encrypted processor version requires decryption. |
Import Documents |
Importing documents with auto-labeling enabled using an encrypted processor version requires decryption. |
CMEK and Cloud Storage
APIs, such as batchProcess
and reviewDocument
,
can read from and write to Cloud Storage buckets.
Any data written to Cloud Storage by Document AI is encrypted using the bucket's configured encryption key, which can be different than your processor's CMEK key.
For more information, see the CMEK documentation for Cloud Storage.
Audit logs
This document describes the audit logs created by Document AI as part of Cloud Audit Logs.
Overview
Google Cloud services write audit logs to help you answer the questions, "Who did what, where, and when?" within your Google Cloud resources.
Your Google Cloud projects contain only the audit logs for resources that are directly within the Google Cloud project. Other Google Cloud resources, such as folders, organizations, and billing accounts, contain the audit logs for the entity itself.
For a general overview of Cloud Audit Logs, see Cloud Audit Logs overview. For a deeper understanding of the audit log format, see Understand audit logs.
Available audit logs
The following types of audit logs are available for Document AI:
-
Admin Activity audit logs
Includes "admin write" operations that write metadata or configuration information.
You can't disable Admin Activity audit logs.
-
Data Access audit logs
Includes "admin read" operations that read metadata or configuration information. Also includes "data read" and "data write" operations that read or write user-provided data.
To receive Data Access audit logs, you must explicitly enable them.
For fuller descriptions of the audit log types, see Types of audit logs.
Audited operations
The following table summarizes which API operations correspond to each audit log type in Document AI:
Audit logs category | Document AI operations |
---|---|
Admin Activity audit logs | humanReviewConfigs.update operations.cancel processors.create processors.delete processors.disable processors.enable processors.setDefaultProcessorVersion procesorVersions.create procesorVersions.delete procesorVersions.deploy procesorVersions.undeploy |
Data Access audit logs | humanReviewConfigs.get humanReviewConfigs.update processors.batchProcess processors.get processors.list processors.process processorVersions.batchProcess processorVersions.get processorVersions.list processorVersions.process |
Audit log format
Audit log entries include the following objects:
The log entry itself, which is an object of type
LogEntry
. Useful fields include the following:- The
logName
contains the resource ID and audit log type. - The
resource
contains the target of the audited operation. - The
timeStamp
contains the time of the audited operation. - The
protoPayload
contains the audited information.
- The
The audit logging data, which is an
AuditLog
object held in theprotoPayload
field of the log entry.Optional service-specific audit information, which is a service-specific object. For earlier integrations, this object is held in the
serviceData
field of theAuditLog
object; later integrations use themetadata
field.
For other fields in these objects, and how to interpret them, review Understand audit logs.
Log name
Cloud Audit Logs log names include resource identifiers indicating the Google Cloud project or other Google Cloud entity that owns the audit logs, and whether the log contains Admin Activity, Data Access, Policy Denied, or System Event audit logging data.
The following are the audit log names, including variables for the resource identifiers:
projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Factivity folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Fdata_access folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Fsystem_event folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Fpolicy billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Factivity billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Fdata_access billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Fpolicy organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Factivity organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Fdata_access organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Fsystem_event organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Fpolicy
Service name
Document AI audit logs use the service name
documentai.googleapis.com
.
For a list of all the Cloud Logging API service names and their corresponding monitored resource type, see Map services to resources.
Resource types
Document AI audit logs use the resource type
audited_resource
for all audit logs.
For a list of all the Cloud Logging monitored resource types and descriptive information, see Monitored resource types.
Caller identities
The IP address of the caller is held in the RequestMetadata.caller_ip
field of
the AuditLog
object. Logging might redact certain
caller identities and IP addresses.
For information about what information is redacted in audit logs, see Caller identities in audit logs.
Enable audit logging
Admin Activity audit logs are always enabled; you can't disable them.
Data Access audit logs are disabled by default and aren't written unless explicitly enabled (the exception is Data Access audit logs for BigQuery, which can't be disabled).
For information about enabling some or all of your Data Access audit logs, see Enable Data Access audit logs.
Permissions and roles
IAM permissions and roles determine your ability to access audit logs data in Google Cloud resources.
When deciding which Logging-specific permissions and roles apply to your use case, consider the following:
The Logs Viewer role (
roles/logging.viewer
) gives you read-only access to Admin Activity, Policy Denied, and System Event audit logs. If you have just this role, you cannot view Data Access audit logs that are in the_Default
bucket.The Private Logs Viewer role
(roles/logging.privateLogViewer
) includes the permissions contained inroles/logging.viewer
, plus the ability to read Data Access audit logs in the_Default
bucket.Note that if these private logs are stored in user-defined buckets, then any user who has permissions to read logs in those buckets can read the private logs. For more information about log buckets, see Routing and storage overview.
For more information about the IAM permissions and roles that apply to audit logs data, see Access control with IAM.
View logs
You can query for all audit logs or you can query for logs by their
audit log name. The audit log name includes the
resource identifier
of the Google Cloud project, folder, billing account, or
organization for which you want to view audit logging information.
Your queries can specify indexed LogEntry
fields, and if you use
the Log Analytics page, which supports SQL queries, then you can
view your query results as a chart.
For more information about querying your logs, see the following pages:
- Build queries in the Logs Explorer.
- Query and view logs in Log Analytics.
- Sample queries for security insights.
Console
In the Google Cloud console, you can use the Logs Explorer to retrieve your audit log entries for your Google Cloud project, folder, or organization:
-
In the Google Cloud console, go to the Logs Explorer page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
Select an existing Google Cloud project, folder, or organization.
To display all audit logs, enter either of the following queries into the query-editor field, and then click Run query:
logName:"cloudaudit.googleapis.com"
protoPayload."@type"="type.googleapis.com/google.cloud.audit.AuditLog"
To display the audit logs for a specific resource and audit log type, in the Query builder pane, do the following:
In Resource type, select the Google Cloud resource whose audit logs you want to see.
In Log name, select the audit log type that you want to see:
- For Admin Activity audit logs, select activity.
- For Data Access audit logs, select data_access.
- For System Event audit logs, select system_event.
- For Policy Denied audit logs, select policy.
Click Run query.
If you don't see these options, then there aren't any audit logs of that type available in the Google Cloud project, folder, or organization.
If you're experiencing issues when trying to view logs in the Logs Explorer, see the troubleshooting information.
For more information about querying by using the Logs Explorer, see Build queries in the Logs Explorer. For information about summarizing log entries in the Logs Explorer by using Gemini, see Summarize log entries with Gemini assistance.
gcloud
The Google Cloud CLI provides a command-line interface to the Logging API. Supply a valid resource identifier in each of the log names. For example, if your query includes a PROJECT_ID, then the project identifier you supply must refer to the currently selected Google Cloud project.
To read your Google Cloud project-level audit log entries, run the following command:
gcloud logging read "logName : projects/PROJECT_ID/logs/cloudaudit.googleapis.com" \ --project=PROJECT_ID
To read your folder-level audit log entries, run the following command:
gcloud logging read "logName : folders/FOLDER_ID/logs/cloudaudit.googleapis.com" \ --folder=FOLDER_ID
To read your organization-level audit log entries, run the following command:
gcloud logging read "logName : organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com" \ --organization=ORGANIZATION_ID
To read your Cloud Billing account-level audit log entries, run the following command:
gcloud logging read "logName : billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com" \ --billing-account=BILLING_ACCOUNT_ID
Add the --freshness
flag
to your command to read logs that are more than 1 day old.
For more information about using the gcloud CLI, see
gcloud logging read
.
API
When building your queries, supply a valid resource identifier in each of the log names. For example, if your query includes a PROJECT_ID, then the project identifier you supply must refer to the currently selected Google Cloud project.
For example, to use the Logging API to view your project-level audit log entries, do the following:
Go to the Try this API section in the documentation for the
entries.list
method.Put the following into the Request body part of the Try this API form. Clicking this prepopulated form automatically fills the request body, but you need to supply a valid PROJECT_ID in each of the log names.
{ "resourceNames": [ "projects/PROJECT_ID" ], "pageSize": 5, "filter": "logName : projects/PROJECT_ID/logs/cloudaudit.googleapis.com" }
Click Execute.
logName=("projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity" OR "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access" OR "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event" OR "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy") protoPayload.serviceName="documentai.googleapis.com"
Route audit logs
You can route audit logs to supported destinations in the same way that you can route other kinds of logs. Here are some reasons you might want to route your audit logs:
To keep audit logs for a longer period of time or to use more powerful search capabilities, you can route copies of your audit logs to Cloud Storage, BigQuery, or Pub/Sub. Using Pub/Sub, you can route to other applications, other repositories, and to third parties.
To manage your audit logs across an entire organization, you can create aggregated sinks that can route logs from any or all Google Cloud projects in the organization.
- If your enabled Data Access audit logs are pushing your Google Cloud projects over your log allotments, you can create sinks that exclude the Data Access audit logs from Logging.
For instructions about routing logs, see Route logs to supported destinations.
Pricing
For more information about pricing, see Cloud Logging pricing summary.