Quotas

This document lists the quotas and system limits that apply to Document AI.

  • Quotas have default values, but you can typically request adjustments.
  • System limits are fixed values that can't be changed.

Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.

The Cloud Quotas system does the following:

In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.

Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.

This document lists the quotas that apply to Document AI.

Service Tiers

Document AI supports two service tiers and associated quotas for online process requests to Generative AI-powered processor versions: provisioned and best effort tiers.

Provisioned tier quota provides 120 pages per minute for base processor versions, such as custom extractor v1.4 and v1.5, and 30 pages per minute for base processor versions like custom extractor v1.5 Pro.

Best effort tier quota provides 120 for base processor versions like custom extractor v1.4 and v1.5, 60 for Pro processor versions such as custom extractor v1.5 Pro, and is only used once the provisioned quota has been exhausted. This applies to quotas BestEffortOnlineProcessDocumentPagesPerMinutePerProjectUS (metric best_effort_online_process_document_pages_us), and BestEffortOnlineProcessDocumentPagesPerMinutePerProjectEU (metric best_effort_online_process_document_pages_eu) in the console.

Notes Custom extractor v1.4 (based on Gemini 2.0 Flash) Custom extractor v1.5 (based on Gemini 2.5 Flash) Custom extractor v1.5 Pro (based on Gemini 2.5 Pro)
Provisioned 120 120 30
Best effort 120 120 60
Organization-level provisioned 240 240 60

If you need more than the best effort quotas have listed, you can make a quota increase request (QIR) by contacting your sales team representative.

To secure more available capacity during high volume traffic, read the section on how to make a capacity reservation request.

There is no service level agreement for best effort tier.

Quotas list

The following quotas apply to Document AI. These quotas apply to each Google Cloud console project and are shared across all applications and IP addresses using that project.

If you would like to process more requests, submit a Document AI quota request for your project in the Google Cloud console.

Provide information about your specific needs and use case in the request.

Request Quota Default Value Notes
Requests per minute 1800 per user View quota in Google Cloud console
Online process requests per minute (US) 120 per project per processor type View quota in Google Cloud console
Online process requests per minute (EU) 120 per project per processor type View quota in Google Cloud console
Number of online process document pages (US) per minute per processor type and model version (Custom Extractor v1.4 with Gemini 2.0 Flash only) 120 pages per minute* View quota in Google Cloud console
Number of online process document pages (EU) per minute per processor type and model version (Custom Extractor v1.4 with Gemini 2.0 Flash only) 120 pages per minute* View quota in Google Cloud console
Number of online process document pages (US) per minute per processor type and model version (Custom Extractor v1.5 with Gemini 2.5 Flash only) 120 pages per minute* View quota in Google Cloud console
Number of online process document pages (EU) per minute per processor type and model version (Custom Extractor v1.5 with Gemini 2.5 Flash only) 120 pages per minute* View quota in Google Cloud console
Online process requests per minute (single region) 6 per project per processor type View quota in Google Cloud console
Concurrent batch process requests per project and region (US) 5 per project View quota in Google Cloud console
Concurrent batch process requests per project and region (EU) 5 per project View quota in Google Cloud console
Concurrent batch process requests per processor (single region) 5 per project View quota in Google Cloud console
Concurrent processor version training requests (US) 1 per project View quota in Google Cloud console
Concurrent processor version training requests (EU) 1 per project View quota in Google Cloud console
Concurrent processor version training requests (single region) 1 per project View quota in Google Cloud console
Deployed custom processor versions (US) 5 per project View quota in Google Cloud console
Deployed custom processor versions (EU) 5 per project View quota in Google Cloud console
Deployed custom processor versions (single region) 5 per project View quota in Google Cloud console
Deployed generative processor versions (US) 100 per project per custom extraction processor View quota in Google Cloud console
Deployed generative processor versions (EU) 100 per project per custom extraction processor View quota in Google Cloud console
Deployed generative processor versions (single region) 100 per project per custom extraction processor View quota in Google Cloud console
Concurrent import documents requests (US) 3 per project View quota in Google Cloud console
Concurrent import documents requests (EU) 3 per project View quota in Google Cloud console
Concurrent import documents requests (single region) 3 per project View quota in Google Cloud console
Concurrent export documents requests (US) 1 per project View quota in Google Cloud console
Concurrent export documents requests (EU) 1 per project View quota in Google Cloud console
Concurrent export documents requests (single region) 1 per project View quota in Google Cloud console

* Quota adjustment requests are not yet supported for this version.

Supported in australia-southeast1 with a quota adjustment request.

Make a capacity reservation request

Document AI capacity reservation provides reserved capacity to serve real-time, high-volume prediction traffic for the subscribed period, helping meet Service Level Agreement (SLA) requirements. Each unit corresponds to an additional page-per-minute beyond the default quota.

Capacity reservation is supported and required for increasing provisioned tier quotas of custom extractor models v1.4 and v1.5, including fine-tuned processor versions built on them.

Pricing for capacity reservation is $300 USD for every extra page-per-minute per-month.

To make a capacity reservation request:

Console

  1. In the Google Cloud console, go to the IAM & Admin > Capacity Reservation page:

    Capacity Reservation

  2. Click the Create new capacity reservation button near the page header. This will take you to a two-page request form.

  3. Fill out the Configure page with the following:

    1. Fill out a name for the order.
    2. Select a region.
    3. Select the processor version from the drop-down menu.
    4. Write the number of additional pages per minute needed per month.
    5. Select the monthly subscription term.
    6. Select start date and time.
    7. Select an auto-renew option from the drop-down.
  4. Click Continue.

  5. On the second page, you'll see an estimated cost per month. You must enter CONFIRM to validate the purchase.

  6. Click Confirm and submit to confirm your order.

You'll be able to see the request status in the Capacity Reservation tab.

The three possible statuses include:

  • Inactive: The subscription hasn't started yet.
  • Active: The subscription is ongoing.
  • Completed: The subscription has finished.

What to consider before purchasing capacity reservation

To help you decide whether you want to purchase capacity reservation, consider the following:

  • You can't cancel your order in the middle of your term.

    Your capacity reservation purchase is a commitment, which means that you can't cancel the order in the middle of your term. However, you can increase the number of purchased GSUs. If you accidentally purchase a commitment or there's a problem with your configuration, contact your Google Cloud account representative for assistance.

  • You can auto-renew your subscription.

    When you submit your order, you can choose to auto-renew your subscription at the end of its term, or let the subscription expire. You can cancel the auto-renew process. To cancel your subscription before it auto-renews, cancel the auto-renewal 30 days before the start of the next term.

    You can configure monthly subscriptions to renew automatically each month. Weekly terms don't support automatic renewal.