Tuned model, text embedding, and Vertex AI Agent Engine quotas and limits

This page provides tuned model quotas, text embedding limits, and Vertex AI Agent Engine quotas.

Tuned model quotas

The following quotas apply to Generative AI on Vertex AI tuned models for a given project and region:

Quota Value
Restricted image training TPU V3 pod cores per region
* supported Region - europe-west4
64
Restricted image training Nvidia A100 80GB GPUs per region
* supported Region - us-central1
* supported Region - us-east4

8
2
* Tuning scenarios have accelerator reservations in specific regions. Quotas for tuning are supported and must be requested in specific regions.

Text embedding limits

Each text embedding model request can have up to 250 input texts (generating 1 embedding per input text) and 20,000 tokens per request.

Only the first 8,192 tokens in each input text is used to compute the embeddings. Each request might only include a single input text.

Vertex AI Agent Engine limits

The following limits apply to Vertex AI Agent Engine for a given project in each region.

Description Limit
Create/Delete/Update Vertex AI Agent Engine per minute 10
Query/StreamQuery Vertex AI Agent Engine per minute 60
Maximum number of Vertex AI Agent Engine resources 100

What's next