Service | Quota | Metric |
---|---|---|
RAG Engine data management APIs | 60 RPM | VertexRagDataService requests per minute per region |
RetrievalContexts API |
1,500 RPM | VertexRagService retrieve requests per minute per region |
base_model: textembedding-gecko |
1,500 RPM | Online prediction requests per base model per minute per region per base_model An additional filter for you to specify is base_model: textembedding-gecko |
Service | Limit | Metric |
---|---|---|
Concurrent ImportRagFiles requests |
3 RPM | VertexRagService concurrent import requests per region |
Maximum number of files per ImportRagFiles request |
10,000 | VertexRagService import rag files requests per region |
For more rate limits and quotas, see Generative AI on Vertex AI rate limits.
What's next
- To learn about the file size limits, see Supported document types.
- To learn about quotas related to RAG Engine, see RAG Engine quotas.
- To learn about customizing parameters, see Retrieval parameters.
- To learn more about the RAG API, see RAG Engine API.
- To learn more about grounding, see Grounding overview.
- To learn more about the difference between grounding and RAG, see Ground responses using RAG.
- To learn more about Generative AI on Vertex AI, see Overview of Generative AI on Vertex AI.
- To learn more about RAG architecture, see the following reference architectures: