Use RagManagedDb with Vertex AI RAG Engine

This page introduces you to RagManagedDb and shows you how to manage its tier configuration as well as the RAG corpus-level retrieval strategy.

Vertex AI RAG Engine uses RagManagedDb, which is an enterprise-ready vector database powered by Spanner to store and manage vector representations of your documents. The vector database is then used to retrieve relevant documents based on the document's semantic similarity to a given query.

Manage your retrieval strategy

RagManagedDb offers the following retrieval strategies to support your RAG use cases:

Retrieval strategy Description
k-Nearest Neighbors (KNN) (Default) Finds the exact nearest neighbors by comparing all data points in your RAG corpus. If you don't specify a strategy during the creation of your RAG corpus, KNN is the default retrieval strategy used.
  • Ensures perfect recall (1.0) during retrieval.
  • Great for recall-sensitive applications.
  • Great for small to medium-sized RAG corpora, which stores less than 10,000 RAG files.
  • Requires searching across every single data point, therefore, the latency increases with the number of RAG files in the corpus.
Approximate Nearest Neighbors (ANN) Uses approximation techniques to find similar neighbors faster than the KNN technique.
  • Reduces query latencies significantly on large RAG corpora.
  • Recall slightly lowered due to approximation techniques used.
  • Becomes very effective when you have large RAG corpora, which is approximately more than 10,000 RAG files.
  • The amount of recall loss that's acceptable to you depends on the use case, but in most large-scale cases, losing a bit of recall in return for improved query performance is an acceptable tradeoff.

Create a RAG corpus with KNN RagManagedDb

This code samples demonstrates how to create a RAG corpus using KNN RagManagedDb.

Python

from vertexai import rag
import vertexai

PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
DISPLAY_NAME = YOUR_RAG_CORPUS_DISPLAY_NAME

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)

vector_db = rag.RagManagedDb(retrieval_strategy=rag.KNN())
rag_corpus = rag.create_corpus(
    display_name=DISPLAY_NAME, backend_config=rag.RagVectorDbConfig(vector_db=vector_db))

REST

Replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
  • CORPUS_DISPLAY_NAME: The display name of the RAG corpus.
PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
CORPUS_DISPLAY_NAME=CORPUS_DISPLAY_NAME

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora \
-d '{
      "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
      "vector_db_config": {
        "ragManagedDb": {
          "knn": {}
        }
      }
    }'

Create a RAG corpus with ANN RagManagedDb

To offer the ANN feature, RagManagedDb uses a tree-based structure to partition data and facilitate faster searches. To enable the best recall and latency, the structure of this tree should be configured by experimentation to fit your data size and distribution. RagManagedDb lets you configure the tree_depth and the leaf_count of the tree.

The tree_depth determines the number of layers or the levels in the tree. Follow these guidelines:

  • If you have approximately 10,000 RAG files in the RAG corpus, set the value to 2.
  • If you have more RAG files than that, set this to 3.
  • If the tree_depth isn't specified, Vertex AI RAG Engine assigns a default value of 2 for this parameter.

The leaf_count determines the number of leaf nodes in the tree-based structure. Each leaf node contains groups of closely related vectors along with their corresponding centroid. Follow these guidelines:

  • The recommended value is 10 * sqrt(num of RAG files in your RAG corpus).
  • If not specified, Vertex AI RAG Engine assigns a default value of 500 for this parameter.

Python

from vertexai import rag
import vertexai

PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
DISPLAY_NAME = YOUR_RAG_CORPUS_DISPLAY_NAME
TREE_DEPTH = YOUR_TREE_DEPTH # Optional: Acceptable values are 2 or 3. Default is 2.
LEAF_COUNT = YOUR_LEAF_COUNT # Optional: Default is 500.

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)

ann_config = rag.ANN(tree_depth=TREE_DEPTH, leaf_count=LEAF_COUNT)
vector_db = rag.RagManagedDb(retrieval_strategy=ann_config)
rag_corpus = rag.create_corpus(
    display_name=DISPLAY_NAME, backend_config=rag.RagVectorDbConfig(vector_db=vector_db))

REST

Replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
  • CORPUS_DISPLAY_NAME: The display name of the RAG corpus.
  • TREE_DEPTH: Your tree depth.
  • LEAF_COUNT: Your leaf count.
PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
CORPUS_DISPLAY_NAME=CORPUS_DISPLAY_NAME
TREE_DEPTH=TREE_DEPTH
LEAF_COUNT=LEAF_COUNT

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora \
-d '{
      "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
      "vector_db_config": {
        "ragManagedDb": {
          "ann": {
            "tree_depth": '"${TREE_DEPTH}"',
            "leaf_count": '"${LEAF_COUNT}"'
          }
        }
      }
    }'

Importing your data into ANN RagManagedDb

You can use either the ImportRagFiles API or the UploadRagFile API to import your data into the ANN RagManagedDb. However, unlike the KNN retrieval strategy, the ANN approach requires the underlying tree-based index to be rebuilt at least once and optionally after importing significant amounts of data for optimal recall. To have Vertex AI RAG Engine rebuild your ANN index, set the rebuild_ann_index to true in your ImportRagFiles API request.

The following are important:

  1. Before you query the RAG corpus, you must rebuild the ANN index at least once.
  2. Only one concurrent index rebuild is supported on a project in each location.

To upload your local file into your RAG corpus, see Upload a RAG file. To import data into your RAG corpus and trigger an ANN index rebuild, see the following code sample that demonstrates how to import from Cloud Storage. To learn about the supported data sources, see Data sources supported for RAG.

Python

from vertexai import rag
import vertexai

PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
CORPUS_ID = YOUR_CORPUS_ID
PATHS = ["gs://my_bucket/my_files_dir"]
REBUILD_ANN_INDEX = REBUILD_ANN_INDEX # Choose true or false.

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)

corpus_name = f"projects/{PROJECT_ID}/locations/{LOCATION}/ragCorpora/{CORPUS_ID}"
# This is a non blocking call.
response = await rag.import_files_async(
    corpus_name=corpus_name,
    paths=PATHS,
    rebuild_ann_index=REBUILD_ANN_INDEX
)

# Wait for the import to complete.
await response.result()

REST

GCS_URI=GCS_URI
REBUILD_ANN_INDEX=<true/false>

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora/${CORPUS_ID}/ragFiles:import \
-d '{
  "import_rag_files_config": {
    "gcs_source": {
      "uris": '\""${GCS_URI}"\"',
      },
    "rebuild_ann_index": '${REBUILD_ANN_INDEX}'
  }
}'

Manage your tier

Vertex AI RAG Engine lets users scale their RagManagedDb instance based on their usage and performance requirements using a choice of two tiers:

  • Enterprise tier (default): This tier offers production-scale performance along with auto scaling functionality. It is suitable for customers with large amounts of data or performance-sensitive workloads.

  • Basic tier: This tier offers a cost-effective and low-compute tier, which might be suitable for some of the following cases:

    • Experimenting with RagManagedDb.
    • Small data size.
    • Latency insensitive workload.
    • Only use Vertex AI RAG Engine with other vector databases.

The tier is a project-level setting available under the RagEngineConfig resource and impacts RAG corpora using RagManagedDb. To get or update the tier, use the GetRagEngineConfig API and UpdateRagEngineConfig API respectively.

Read your RagEngineConfig

The following sample code demonstrates how to read your RagEngineConfig:

Python

from vertexai import rag
import vertexai

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)

rag_engine_config = rag.rag_data.get_rag_engine_config(
    name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
)

print(rag_engine_config)

REST

Replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig

Update your RagEngineConfig to the Enterprise tier

The following code samples demonstrate how to set the RagEngineConfig to the Enterprise tier:

Python

from vertexai import rag
import vertexai

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)

rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"

new_rag_engine_config = rag.RagEngineConfig(
    name=rag_engine_config_name,
    rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Enterprise()),
)

updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
    rag_engine_config=new_rag_engine_config
)

print(updated_rag_engine_config)

REST

Replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'enterprise': {}}}"

Update your RagEngineConfig to the Basic tier

If you have a large amount of data in your RagManagedDb across all of your RAG corpora, then downgrading to a Basic tier might fail. You must have a minimum compute and storage capacity that holds your resources.

Python

from vertexai import rag
import vertexai

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)

rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"

new_rag_engine_config = rag.RagEngineConfig(
    name=rag_engine_config_name,
    rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Basic()),
)

updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
    rag_engine_config=new_rag_engine_config
)

print(updated_rag_engine_config)

REST

Replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'basic': {}}}"

What's next