Tuned model, text embedding, and Vertex AI Agent Engine quotas and limits
Stay organized with collections Save and categorize content based on your preferences.

This page provides tuned model quotas, text embedding limits, and Vertex AI Agent Engine quotas.

Tuned model quotas

The following quotas apply to Generative AI on Vertex AI tuned models for a given project and region:

Quota	Value
Restricted image training TPU V3 pod cores per region * supported Region - europe-west4	64
Restricted image training Nvidia A100 80GB GPUs per region * supported Region - us-central1 * supported Region - us-east4	8 2

* Tuning scenarios have accelerator reservations in specific regions. Quotas for tuning are supported and must be requested in specific regions.

Each text embedding model request can have up to 250 input texts (generating 1 embedding per input text) and 20,000 tokens per request.

Only the first 8,192 tokens in each input text is used to compute the embeddings. Each request might only include a single input text.

The following limits apply to Vertex AI Agent Engine for a given project in each region.

Description	Limit
Create/Delete/Update Vertex AI Agent Engine per minute	10
Query/StreamQuery Vertex AI Agent Engine per minute	60
Maximum number of Vertex AI Agent Engine resources	100

To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
To learn more about Google Cloud quotas and limits, see Understand quota values and system limits.