English
Deutsch
Español – América Latina
Français
Indonesia
Italiano
Português – Brasil
中文 – 简体
日本語
한국어

Console

Contact Us Start free

Vertex AI RAG Engine quotas and limits
Stay organized with collections Save and categorize content based on your preferences.

For each service to perform retrieval-augmented generation (RAG) using RAG Engine, the following quotas apply, with the quota measured as requests per minute (RPM).

Service	Quota	Metric
RAG Engine data management APIs	60 RPM	`VertexRagDataService requests per minute per region`
`RetrievalContexts` API	1,500 RPM	`VertexRagService retrieve requests per minute per region`
`base_model: textembedding-gecko`	1,500 RPM	`Online prediction requests per base model per minute per region per base_model` An additional filter for you to specify is `base_model: textembedding-gecko`

The following limits apply:

Service	Limit	Metric
Concurrent `ImportRagFiles` requests	3 RPM	`VertexRagService concurrent import requests per region`
Maximum number of files per `ImportRagFiles` request	10,000	`VertexRagService import rag files requests per region`

For more rate limits and quotas, see Generative AI on Vertex AI rate limits.

What's next

To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
To learn more about Google Cloud quotas and limits, see Understand quota values and system limits.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-04-12 UTC.