Stay organized with collections
Save and categorize content based on your preferences.
For each service to perform retrieval-augmented generation (RAG) using RAG Engine, the
following quotas apply, with the quota measured as requests per minute (RPM).
Service
Quota
Metric
RAG Engine data management APIs
60 RPM
VertexRagDataService requests per minute per region
RetrievalContexts API
1,500 RPM
VertexRagService retrieve requests per minute per region
base_model: textembedding-gecko
1,500 RPM
Online prediction requests per base model per minute per region per base_model
An additional filter for you to specify is base_model: textembedding-gecko
The following limits apply:
Service
Limit
Metric
Concurrent ImportRagFiles requests
3 RPM
VertexRagService concurrent import requests per region
Maximum number of files per ImportRagFiles request
10,000
VertexRagService import rag files requests per region
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-04-12 UTC."],[],[]]