This page introduces you to RagManagedDb
and shows you how to manage its tier
configuration as well as the RAG corpus-level retrieval strategy.
Vertex AI RAG Engine uses RagManagedDb
, which is an enterprise-ready
vector database powered by Spanner to store and manage vector
representations of your documents. The vector database is then used to retrieve
relevant documents based on the document's semantic similarity to a given query.
Manage your retrieval strategy
RagManagedDb
offers the following retrieval strategies to support your RAG use
cases:
Retrieval strategy | Description | |
---|---|---|
k-Nearest Neighbors (KNN) (Default) | Finds the exact nearest neighbors by comparing all data points in your RAG corpus. If you don't specify a strategy during the creation of your RAG corpus, KNN is the default retrieval strategy used. |
|
Approximate Nearest Neighbors (ANN) | Uses approximation techniques to find similar neighbors faster than the KNN technique. |
|
Create a RAG corpus with KNN RagManagedDb
This code samples demonstrates how to create a RAG corpus using KNN
RagManagedDb
.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
DISPLAY_NAME = YOUR_RAG_CORPUS_DISPLAY_NAME
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
vector_db = rag.RagManagedDb(retrieval_strategy=rag.KNN())
rag_corpus = rag.create_corpus(
display_name=DISPLAY_NAME, backend_config=rag.RagVectorDbConfig(vector_db=vector_db))
REST
Replace the following variables:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- CORPUS_DISPLAY_NAME: The display name of the RAG corpus.
PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
CORPUS_DISPLAY_NAME=CORPUS_DISPLAY_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora \
-d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"vector_db_config": {
"ragManagedDb": {
"knn": {}
}
}
}'
Create a RAG corpus with ANN RagManagedDb
To offer the ANN feature, RagManagedDb
uses a tree-based structure to
partition data and facilitate faster searches. To enable the best recall and
latency, the structure of this tree should be configured by experimentation to
fit your data size and distribution. RagManagedDb
lets you configure the
tree_depth
and the leaf_count
of the tree.
The tree_depth
determines the number of layers or the levels in the tree.
Follow these guidelines:
- If you have approximately 10,000 RAG files in the RAG corpus, set the value to 2.
- If you have more RAG files than that, set this to 3.
- If the
tree_depth
isn't specified, Vertex AI RAG Engine assigns a default value of 2 for this parameter.
The leaf_count
determines the number of leaf nodes in the tree-based
structure. Each leaf node contains groups of closely related vectors along
with their corresponding centroid. Follow these guidelines:
- The recommended value is
10 * sqrt(num of RAG files in your RAG corpus)
. If not specified, Vertex AI RAG Engine assigns a default value of 500 for this parameter.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
DISPLAY_NAME = YOUR_RAG_CORPUS_DISPLAY_NAME
TREE_DEPTH = YOUR_TREE_DEPTH # Optional: Acceptable values are 2 or 3. Default is 2.
LEAF_COUNT = YOUR_LEAF_COUNT # Optional: Default is 500.
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
ann_config = rag.ANN(tree_depth=TREE_DEPTH, leaf_count=LEAF_COUNT)
vector_db = rag.RagManagedDb(retrieval_strategy=ann_config)
rag_corpus = rag.create_corpus(
display_name=DISPLAY_NAME, backend_config=rag.RagVectorDbConfig(vector_db=vector_db))
REST
Replace the following variables:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- CORPUS_DISPLAY_NAME: The display name of the RAG corpus.
- TREE_DEPTH: Your tree depth.
- LEAF_COUNT: Your leaf count.
PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
CORPUS_DISPLAY_NAME=CORPUS_DISPLAY_NAME
TREE_DEPTH=TREE_DEPTH
LEAF_COUNT=LEAF_COUNT
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora \
-d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"vector_db_config": {
"ragManagedDb": {
"ann": {
"tree_depth": '"${TREE_DEPTH}"',
"leaf_count": '"${LEAF_COUNT}"'
}
}
}
}'
Importing your data into ANN RagManagedDb
You can use either the ImportRagFiles
API or the UploadRagFile
API to import
your data into the ANN RagManagedDb
. However, unlike the KNN retrieval
strategy, the ANN approach requires the underlying tree-based index to be
rebuilt at least once and optionally after importing significant amounts of data
for optimal recall. To have Vertex AI RAG Engine rebuild your ANN
index, set the rebuild_ann_index
to true in your ImportRagFiles
API request.
The following are important:
- Before you query the RAG corpus, you must rebuild the ANN index at least once.
- Only one concurrent index rebuild is supported on a project in each location.
To upload your local file into your RAG corpus, see Upload a RAG file. To import data into your RAG corpus and trigger an ANN index rebuild, see the following code sample that demonstrates how to import from Cloud Storage. To learn about the supported data sources, see Data sources supported for RAG.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
CORPUS_ID = YOUR_CORPUS_ID
PATHS = ["gs://my_bucket/my_files_dir"]
REBUILD_ANN_INDEX = REBUILD_ANN_INDEX # Choose true or false.
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
corpus_name = f"projects/{PROJECT_ID}/locations/{LOCATION}/ragCorpora/{CORPUS_ID}"
# This is a non blocking call.
response = await rag.import_files_async(
corpus_name=corpus_name,
paths=PATHS,
rebuild_ann_index=REBUILD_ANN_INDEX
)
# Wait for the import to complete.
await response.result()
REST
GCS_URI=GCS_URI
REBUILD_ANN_INDEX=<true/false>
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora/${CORPUS_ID}/ragFiles:import \
-d '{
"import_rag_files_config": {
"gcs_source": {
"uris": '\""${GCS_URI}"\"',
},
"rebuild_ann_index": '${REBUILD_ANN_INDEX}'
}
}'
Manage your tier
Vertex AI RAG Engine lets users scale their RagManagedDb
instance based on their usage and performance requirements using a choice of two
tiers:
Enterprise tier (default): This tier offers production-scale performance along with auto scaling functionality. It is suitable for customers with large amounts of data or performance-sensitive workloads.
Basic tier: This tier offers a cost-effective and low-compute tier, which might be suitable for some of the following cases:
- Experimenting with
RagManagedDb
. - Small data size.
- Latency insensitive workload.
- Only use Vertex AI RAG Engine with other vector databases.
- Experimenting with
The tier is a project-level setting available under the RagEngineConfig
resource and impacts RAG corpora using RagManagedDb
. To get or update
the tier, use the GetRagEngineConfig
API and UpdateRagEngineConfig
API respectively.
Read your RagEngineConfig
The following sample code demonstrates how to read your RagEngineConfig
:
Python
from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config = rag.rag_data.get_rag_engine_config(
name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
)
print(rag_engine_config)
REST
Replace the following variables:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig
Update your RagEngineConfig
to the Enterprise tier
The following code samples demonstrate how to set the RagEngineConfig
to the
Enterprise tier:
Python
from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
new_rag_engine_config = rag.RagEngineConfig(
name=rag_engine_config_name,
rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Enterprise()),
)
updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
rag_engine_config=new_rag_engine_config
)
print(updated_rag_engine_config)
REST
Replace the following variables:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'enterprise': {}}}"
Update your RagEngineConfig
to the Basic tier
If you have a large amount of data in your RagManagedDb
across all of your RAG
corpora, then downgrading to a Basic tier might fail. You must have a minimum
compute and storage capacity that holds your resources.
Python
from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
new_rag_engine_config = rag.RagEngineConfig(
name=rag_engine_config_name,
rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Basic()),
)
updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
rag_engine_config=new_rag_engine_config
)
print(updated_rag_engine_config)
REST
Replace the following variables:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'basic': {}}}"
What's next
- To import files and folders from Google Drive or Cloud Storage, see Import RAG files example.
- To list RAG files, see List RAG files example.