This page provides tuned model quotas, text embedding limits, and Vertex AI Agent Engine quotas.
Tuned model quotas
The following quotas apply to Generative AI on Vertex AI tuned models for a given project and region:
Quota | Value |
---|---|
Restricted image training TPU V3 pod cores per region * supported Region - europe-west4 |
64 |
Restricted image training Nvidia A100 80GB GPUs per region * supported Region - us-central1 * supported Region - us-east4 |
8 2 |
Text embedding limits
Each text embedding model request can have up to 250 input texts (generating 1 embedding per input text) and 20,000 tokens per request.
Only the first 8,192 tokens in each input text is used to compute the embeddings. Each request might only include a single input text.
Vertex AI Agent Engine limits
The following limits apply to Vertex AI Agent Engine for a given project in each region.
Description | Limit |
---|---|
Create/Delete/Update Vertex AI Agent Engine per minute | 10 |
Query/StreamQuery Vertex AI Agent Engine per minute | 60 |
Maximum number of Vertex AI Agent Engine resources | 100 |
What's next
- To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
- To learn more about Google Cloud quotas and limits, see
Understand quota values and system limits.