Quotas and limits

This document lists the quotas and system limits that apply to Vertex AI Search.

Quotas have default values, but you can typically request adjustments.
System limits are fixed values that can't be changed.

Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.

The Cloud Quotas system does the following:

Monitors your consumption of Google Cloud products and services
Restricts your consumption of those resources
Provides a way to request changes to the quota value and automate quota adjustments

In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.

Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.

There are also system limits on Vertex AI Search resources. System limits can't be changed.

Allocation quotas

The following table lists the quotas for Discovery Engine API. These quotas don't reset over time. Instead, they're released when you release the resource. You can request a quota increase if the default quota isn't enough.

Quota	Value
Total number of data stores per project	100^*
Total number of engines per project	150^†
Number of pending import long running operations per project	300
Number of pending purge documents long running operations per project	100
Number of serving controls per project	1,000
Number of boost serving controls per serving config	100
Number of filter serving controls per serving config	100
Number of redirect serving controls per serving config	100
Number of synonym serving controls per serving config	100
Regional number of data stores per project per location (Global, US, EU)	100
Regional number of documents per project per location (Global, US, EU)	10,000,000
Regional number of engines per project per location (Global, US, EU)	150
Regional number of user events per project per location (Global, US, EU)	40,000,000,000

^* Due to a technical limitation, the maximum quota for data stores is 500 per project. If you need more data stores, use new projects.

^† Due to a technical limitation, the maximum quota for engines is 500 per project. If you need more engines, use new projects.

Rate quotas

The following quotas apply to Discovery Engine API requests. If the default quota isn't enough, you can request a quota increase.

Quota	Value
Complete query requests per minute per project	300
Conversational search read requests per minute per project	300
Conversational search write requests per minute per project	300
Document batch requests (such as inline import and purge) per minute per project	100
Document async import (Cloud Storage, BigQuery, etc.) requests per minute per project	5
Document purge requests per minute per project	100
Document read requests per minute per project	300
Document write requests per minute per project	12,000
Evaluation create requests per day per project	5
Evaluation read requests per minute per project	100
Evaluation write requests per minute per project	100
Grounded generation check grounding requests per minute per project	300
LLM query requests (search summarization, multi-turn search) per minute per project	60
Number of pending FHIR or BigQuery streaming writes per minute	6,000
Number of sample query sets per project	100
Ranking API requests per minute per project	500
Recommend requests per minute per project	60,000
Sample query read requests per minute per project	200
Sample query set read requests per minute per project	100
Sample query set write requests per minute per project	100
Sample query write requests per minute per project	200
Schema read requests per minute per project	100
Schema write requests per minute per project	100
Regional search requests per minute per project per location (Global, US, EU)	300
User event batch requests (such as import and purge) per minute per project	100
User event collect requests per minute per project per user	240
User event write requests per minute per project	60,000

Quota for web page indexing

When you have a data store with Advanced website indexing turned on, every web page that you index counts towards the "Number of documents per project" quota in the Allocation quotas list. You can also see the number of pages in your project and the page quota for that project in the Project pages vs quota field in the Data page for a data store.

If you add websites to a data store in a project and the web pages in those websites exceed the project's quota, the websites are not indexed. If you have websites in your data store that are already indexed, those websites continue to be indexed as before. You can request to upgrade your quota at any time.

Request a quota increase

To adjust most quotas, use the Google Cloud console. For more information, see Request a quota adjustment.