Use Vertex AI Vector Search with RAG Engine

This page shows you how to connect your RAG Engine to Vertex AI Vector Search.

RAG Engine is a powerful tool that uses a built-in vector database powered by Spanner to store and manage vector representations of text documents. The vector database enables efficient retrieval of relevant documents based on the documents' semantic similarity to a given query. By integrating Vertex AI Vector Search as an additional vector database with RAG Engine, you can use the capabilities of Vector Search to handle data volumes with low latency to improve the performance and scalability of your RAG applications.

Vertex AI Vector Search setup

Vertex AI Vector Search is based on Vector Search technology developed by Google research. With Vector Search you can use the same infrastructure that provides a foundation for Google products such as Google Search, YouTube, and Google Play.

To integrate with RAG Engine, an empty Vector Search index is required.

Set up Vertex AI SDK

To prepare Vertex AI Vector Search instances for the RAG application, follow these steps:

  1. To set up Vertex AI SDK, see Setup.

  2. Set your environment variables to the following:

    PROJECT_ID=YOUR_PROJECT_ID
    LOCATION=YOUR_LOCATION_ID
    
  3. Optional: If you are using Vertex AI Workbench, then it is pre-authenticated, and this step isn't required. Otherwise, to run the notebook, you must run the following cell authentication:

    # If it's Colab runtime, authenticate the user with Google Cloud
    if "google.colab" in sys.modules:
        from google.colab import auth
    
        auth.authenticate_user()
    
  4. Enable your APIs by entering this command:

    ! gcloud services enable compute.googleapis.com aiplatform.googleapis.com --project "{PROJECT_ID}"

Initialize the aiplatform SDK

To initialize the aiplatform SDK, do the following:

# init the aiplatform package
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)

Create Vector Search index

To create a Vector Search index that's compatible with your RAG Corpus, the index has to meet the following criteria:

  1. IndexUpdateMethod must be STREAM_UPDATE, see Create stream index.

  2. Distance measure type must be explicitly set to one of the following:

    • DOT_PRODUCT_DISTANCE
    • COSINE_DISTANCE
  3. Dimension of the vector must be consistent with the embedding model you plan to use in the RAG corpus. Other parameters can be tuned based on your choices, which determine whether the additional parameters can be tuned.

# create the index
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name="your-display-name",
    description="your-discription",
    dimensions=768,
    approximate_neighbors_count=10,
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    feature_norm_type="UNIT_L2_NORM",
    index_update_method="STREAM_UPDATE",
)

Create Vector Search index endpoint

Public endpoints are supported by RAG Engine.

# create IndexEndpoint
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="your-display-name", public_endpoint_enabled=True
)

Deploy an index to an index endpoint

Before we do the nearest neighbor search, the index has to be deployed to an index endpoint.

DEPLOYED_INDEX_ID="YOUR_DEPLOYED_INDEX_ID"

my_index_endpoint.deploy_index(index=my_index, deployed_index_id=DEPLOYED_INDEX_ID)

If it's the first time that you're deploying an index to an index endpoint, it takes approximately 30 minutes to automatically build and initiate the backend before the index can be stored. After the first deployment, the index is ready in seconds. To see the status of the index deployment, open the Vector Search Console, select the Index endpoints tab, and choose your index endpoint.

Identify the resource name of your index and index endpoint, which have the following the formats:

  • projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}
  • projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}.

If you aren't sure about the resource name, you can use the following command to check:

print(my_index_endpoint.resource_name)
print(my_index.resource_name)

Use Vertex AI Vector Search in RAG Engine

After the Vector Search instance is set up, follow the steps in this section to set the Vector Search instance as the vector database for the RAG application.

Set the vector database to create a RAG corpus

When you create the RAG corpus, specify only the full INDEX_ENDPOINT_NAME and INDEX_NAME. The RAG corpus is created and automatically associated with the Vector Search index. Validations are performed on the criteria. If any of the requirements aren't met, the request is rejected.

Python

CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
index_resource_name = my_index.resource_name
endpoint_resource_name = my_index_endpoint.resource_name
vector_db = rag.VertexVectorSearch(index=index_resource_name, index_endpoint=endpoint_resource_name)
rag_corpus = rag.create_corpus(display_name=CORPUS_DISPLAY_NAME, vector_db=vector_db)

REST

// TODO(developer): Update and un-comment the following lines:
// CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
// Full index/indexEndpoint resource name
// Index: projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}
// IndexEndpoint: projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}
// INDEX_RESOURCE_NAME = "YOUR_INDEX_ENDPOINT_RESOURCE_NAME"
// INDEX_NAME = "YOUR_INDEX_RESOURCE_NAME"
// Call CreateRagCorpus API to create a new RagCorpus
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
      "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
      "rag_vector_db_config" : {
              "vertex_vector_search": {
                "index":'\""${INDEX_NAME}"\"'
            "index_endpoint":'\""${INDEX_ENDPOINT_NAME}"\"'
              }
        }
  }'

// Call ListRagCorpora API to verify the RagCorpus is created successfully
curl -sS -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora"

Optional: Create RAG corpus without Vector Search information

To create an empty RAG corpus without Vector Search information that you can update later, select one of the code samples:

Python

CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
vector_db = rag.VertexVectorSearch()
rag_corpus = rag.create_corpus(display_name=CORPUS_DISPLAY_NAME, vector_db=vector_db)

REST

// TODO(developer): Update and un-comment the following lines:
// Call CreateRagCorpus API to create a new RAG corpus without the Vector Search information.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
      "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
      "rag_vector_db_config" : {
              "vertex_vector_search": {}
        }
  }'

// Call ListRagCorpora API to verify the RagCorpus is created successfully
curl -sS -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora"

After Vector Search resources have been set up, you can update the RAG corpus with the corresponding information.

Python

index_resource_name = my_index.resource_name
endpoint_resource_name = my_index_endpoint.resource_name
vector_db = rag.VertexVectorSearch(index=index_resource_name, index_endpoint=endpoint_resource_name)
updated_rag_corpus = rag.update_corpus(corpus_name=rag_corpus.name, vector_db=vector_db)

REST

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
      "rag_vector_db_config" : {
"vertex_vector_search": {
                "index":'\""${INDEX_NAME}"\"'
                "index_endpoint":'\""${INDEX_ENDPOINT_NAME}"\"'
              }
}
  }'

Import files using the RAG API

Use the ImportRagFiles API to import files from Cloud Storage or Google Drive into the Vector Search index. The files are embedded and stored in the Vector Search index.

Python

RAG_CORPUS_RESOURCE = "projects/{PROJECT_ID}/locations/{LOCATION_ID}/ragCorpora/YOUR_RAG_CORPUS_ID"
GS_BUCKET = "YOUR_GS_BUCKET"

response = rag.import_files(
    corpus_name=RAG_CORPUS_RESOURCE,
    paths=[GS_BUCKET],
    chunk_size=512,  # Optional
    chunk_overlap=100,  # Optional
)

REST

// TODO(developer): Update and un-comment the following lines:
// RAG_CORPUS_ID = "YOUR_RAG_CORPUS_ID"
//
// Google Cloud Storage bucket and file location.
// For example, "gs://rag-fos-test/"
// GCS_URIS= "YOUR_GCS_URIS"

// Call ImportRagFiles API to embed files and store in the BigQuery table
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora/${RAG_CORPUS_ID}/ragFiles:import \
-d '{
  "import_rag_files_config": {
    "gcs_source": {
      "uris": '\""${GCS_URIS}"\"'
    },
    "rag_file_chunking_config": {
      "chunk_size": 512
    }
  }
}'

// Call ListRagFiles API to verify that the files are imported successfully
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora/${RAG_CORPUS_ID}/ragFiles

Retrieve relevant contexts using the RAG API

After completion of the file imports, the relevant context can be retrieved from the Vector Search index by using the RetrieveContexts API.

Python

RAG_CORPUS_RESOURCE = "projects/{PROJECT_ID}/locations/{LOCATION_ID}/ragCorpora/YOUR_RAG_CORPUS_ID"
RETRIEVAL_QUERY = "YOUR_RETRIEVAL_QUERY"

response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=RAG_CORPUS_RESOURCE,
            # Optional: supply IDs from `rag.list_files()`.
            # rag_file_ids=["rag-file-1", "rag-file-2", ...],
        )
    ],
    text=RETRIEVAL_QUERY,
    similarity_top_k=10,  # Optional
    vector_distance_threshold=0.3,  # Optional
)
print(response)

REST

// TODO(developer): Update and un-comment the following lines:
// RETRIEVAL_QUERY="YOUR_RETRIEVAL_QUERY"
//
// Full RagCorpus resource name
// Format:
// "projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora/${RAG_CORPUS_ID}"
// RAG_CORPUS_RESOURCE="YOUR_RAG_CORPUS_RESOURCE"

// Call RetrieveContexts API to retrieve relevant contexts
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}:retrieveContexts \
  -d '{
    "vertex_rag_store": {
      "rag_resources": {
          "rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
        },
      "vector_distance_threshold": 0.3
    },    
    "query": {
      "text": '\""${RETRIEVAL_QUERY}"\"', 
      "similarity_top_k": 10
    }
  }'

Generate content using Vertex AI Gemini API

To generate content using Gemini models, make a call to the Vertex AI GenerateContent API. By specifying the RAG_CORPUS_RESOURCE in the request, the API automatically retrieves data from the Vector Search index.

Python

from vertexai.preview.generative_models import GenerativeModel, Tool

RAG_CORPUS_RESOURCE = "projects/{PROJECT_ID}/locations/{LOCATION_ID}/ragCorpora/YOUR_RAG_CORPUS_ID"

rag_retrieval_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[
                rag.RagResource(
                    rag_corpus=RAG_CORPUS_RESOURCE,
                    # Optional: supply IDs from `rag.list_files()`.
                    # rag_file_ids=["rag-file-1", "rag-file-2", ...],
                )
            ],
            similarity_top_k=10,  # Optional
            vector_distance_threshold=0.3,   # Optional
        ),
    )
)

rag_model = GenerativeModel(
  model_name="gemini-1.5-flash-001", tools=[rag_retrieval_tool]
)

GENERATE_CONTENT_PROMPT="YOUR_GENERATE_CONTENT_PROMPT"

response = rag_model.generate_content(GENERATE_CONTENT_PROMPT)
print(response.text)

REST

// TODO(developer): Update and un-comment the following lines:
// MODEL_ID=gemini-pro
// GENERATE_CONTENT_PROMPT="YOUR_GENERATE_CONTENT_PROMPT"

// GenerateContent with contexts retrieved from the FeatureStoreOnline index
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json"  https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/publishers/google/models/${MODEL_ID}:generateContent \
-d '{
  "contents": { 
    "role": "user", 
    "parts": { 
      "text": '\""${GENERATE_CONTENT_PROMPT}"\"' 
    } 
  },
  "tools": {
    "retrieval": {
      "vertex_rag_store": {
        "rag_resources": {
            "rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
          },
        "similarity_top_k": 8,
        "vector_distance_threshold": 0.32
      }
    }
  }
}'

What's next