Quotas and system limits

This document lists the quotas and system limits that apply to Vertex AI Agent Builder.

  • Quotas have default values, but you can typically request adjustments.
  • System limits are fixed values that can't be changed.

Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.

The Cloud Quotas system does the following:

In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.

Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.

Vertex AI Agent Engine quotas

The following quotas apply to Vertex AI Agent Engine for a given project in each region:
Description Quota Metric
Create, delete, or update Vertex AI Agent Engine resources per minute 10 aiplatform.googleapis.com/reasoning_engine_service_write_requests
Create, delete, or update Vertex AI Agent Engine sessions per minute 100 aiplatform.googleapis.com/session_write_requests
Query or StreamQuery Vertex AI Agent Engine per minute 90 aiplatform.googleapis.com/reasoning_engine_service_query_requests
Append event to Vertex AI Agent Engine sessions per minute 300 aiplatform.googleapis.com/session_event_append_requests
Maximum number of Vertex AI Agent Engine resources 100 aiplatform.googleapis.com/reasoning_engine_service_entities
Create, delete, or update Vertex AI Agent Engine memory resources per minute 100 aiplatform.googleapis.com/memory_bank_write_requests
Get, list, or retrieve from Vertex AI Agent Engine Memory Bank per minute 300 aiplatform.googleapis.com/memory_bank_read_requests
Sandbox environment (Code Execution) execute requests per minute 1000 aiplatform.googleapis.com/sandbox_environment_execute_requests
Sandbox environment (Code Execution) entities per region 1000 aiplatform.googleapis.com/sandbox_environment_entities
A2A Agent post requests like sendMessage and cancelTaskper minute 60 aiplatform.googleapis.com/a2a_agent_post_requests
A2A Agent get requests like getTask and getCard per minute 600 aiplatform.googleapis.com/a2a_agent_get_requests
Concurrent live bidirectional connections using the BidiStreamQuery API per minute 10 aiplatform.googleapis.com/reasoning_engine_service_concurrent_query_requests

Request a quota adjustment

To adjust most quotas, use the Google Cloud console. For more information, see Request a quota adjustment.