Use Vertex AI Search as a retrieval backend using Vertex AI RAG Engine

This guide shows you how to use Vertex AI Search as a retrieval backend for Vertex AI RAG Engine.

Vertex AI Search brings together deep information retrieval, natural-language processing, and the latest features in large language model (LLM) processing to help understand user intent and return the most relevant results.

Using Vertex AI Search as your retrieval backend provides the following benefits for your Vertex AI RAG applications:

  • Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with low latency, which improves response times and performance for your RAG applications.
  • Simplified data management: You can import data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, which streamlines your data ingestion process.
  • Seamless integration: Vertex AI has built-in integration with Vertex AI Search, which lets you select it as the corpus backend for your RAG application.
  • Improved LLM output quality: The retrieval capabilities of Vertex AI Search help your RAG application retrieve the most relevant information from your corpus. This leads to more accurate and informative LLM-generated outputs.

Before you begin

To use Vertex AI Search as a retrieval backend, you first need to set up a search data store and application.

  1. Create a search data store.
  2. Create a search application.

Use Vertex AI Search as a retrieval backend for Vertex AI RAG Engine

After you set up your Vertex AI Search application, you can use it as the retrieval backend for your RAG application.

Create a RAG corpus

These code samples show you how to configure Vertex AI Search as the retrieval backend for a RAG corpus.

REST

To use the command line to create a RAG corpus, do the following:

  1. Create a RAG corpus

    Replace the following variables used in the code sample:

    • PROJECT_ID: The ID of your Google Cloud project.
    • LOCATION: The region to process the request.
    • DISPLAY_NAME: The display name of the RAG corpus that you want to create.
    • ENGINE_NAME: The full resource name of the Vertex AI Search engine or Vertex AI Search Datastore. For example,

      projects/PROJECT_NUMBER/locations/LOCATION/collections/default_collection/engines/ENGINE_NAME/servingConfigs/default_search

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/ragCorpora" \
    -d '{
      "display_name" : "DISPLAY_NAME",
      "vertex_ai_search_config" : {
        "serving_config": "ENGINE_NAME/servingConfigs/default_search"
      }
    }'
    
  2. Monitor progress

    Replace the following variables used in the code sample:

    • PROJECT_ID: The ID of your Google Cloud project.
    • LOCATION: The region to process the request.
    • OPERATION_ID: The ID of the RAG corpus create operation.
    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"
    

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


from vertexai import rag
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# vertex_ai_search_engine_name = "projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/engines/{ENGINE_ID}"
# display_name = "test_corpus"
# description = "Corpus Description"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

# Configure Search
vertex_ai_search_config = rag.VertexAiSearchConfig(
    serving_config=f"{vertex_ai_search_engine_name}/servingConfigs/default_search",
)

corpus = rag.create_corpus(
    display_name=display_name,
    description=description,
    vertex_ai_search_config=vertex_ai_search_config,
)
print(corpus)
# Example response:
# RagCorpus(name='projects/1234567890/locations/us-central1/ragCorpora/1234567890',
# display_name='test_corpus', description='Corpus Description'.
# ...

Retrieve contexts using the RAG API

After you create the RAG corpus, you can retrieve relevant contexts from Vertex AI Search by using the RetrieveContexts API.

REST

This code sample demonstrates how to retrieve contexts using REST.

Replace the following variables used in the code sample:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The region to process the request.
  • RAG_CORPUS_RESOURCE: The name of the RAG corpus resource.

    Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.

  • TEXT: The query text to get relevant contexts.
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
  -d '{
    "vertex_rag_store": {
      "rag_resources": {
        "rag_corpus": "RAG_CORPUS_RESOURCE"
      }
    },
    "query": {
      "text": "TEXT"
    }
  }'

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.


from vertexai import rag
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# corpus_name = "projects/[PROJECT_ID]/locations/us-central1/ragCorpora/[rag_corpus_id]"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=corpus_name,
            # Optional: supply IDs from `rag.list_files()`.
            # rag_file_ids=["rag-file-1", "rag-file-2", ...],
        )
    ],
    text="Hello World!",
    rag_retrieval_config=rag.RagRetrievalConfig(
        top_k=10,
        filter=rag.utils.resources.Filter(vector_distance_threshold=0.5),
    ),
)
print(response)
# Example response:
# contexts {
#   contexts {
#     source_uri: "gs://your-bucket-name/file.txt"
#     text: "....
#   ....

Generate content using the Vertex AI Gemini API

To generate content with Gemini models, call the Vertex AI GenerateContent API. When you specify the RAG_CORPUS_RESOURCE in the request, the API automatically retrieves data from Vertex AI Search.

REST

To generate content using Gemini models, make a call to the Vertex AI GenerateContent API. By specifying the RAG_CORPUS_RESOURCE in the request, it automatically retrieves data from Vertex AI Search.

Replace the following variables used in the sample code:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The region to process the request.
  • MODEL_ID: LLM model for content generation. For example, gemini-2.0-flash.
  • GENERATION_METHOD: LLM method for content generation. For example, generateContent, streamGenerateContent.
  • INPUT_PROMPT: The text that is sent to the LLM for content generation. Try to use a prompt relevant to the documents in Vertex AI Search.
  • RAG_CORPUS_RESOURCE: The name of the RAG corpus resource. Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.
  • SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD" \
    -d '{
      "contents": {
        "role": "user",
        "parts": {
          "text": "INPUT_PROMPT"
        }
      },
      "tools": {
        "retrieval": {
          "disable_attribution": false,
          "vertex_rag_store": {
            "rag_resources": {
                "rag_corpus": "RAG_CORPUS_RESOURCE"
              },
            "similarity_top_k": SIMILARITY_TOP_K
          }
        }
      }
    }'
    

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.


from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# corpus_name = "projects/{PROJECT_ID}/locations/us-central1/ragCorpora/{rag_corpus_id}"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

rag_retrieval_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[
                rag.RagResource(
                    rag_corpus=corpus_name,
                    # Optional: supply IDs from `rag.list_files()`.
                    # rag_file_ids=["rag-file-1", "rag-file-2", ...],
                )
            ],
            rag_retrieval_config=rag.RagRetrievalConfig(
                top_k=10,
                filter=rag.utils.resources.Filter(vector_distance_threshold=0.5),
            ),
        ),
    )
)

rag_model = GenerativeModel(
    model_name="gemini-2.0-flash-001", tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("Why is the sky blue?")
print(response.text)
# Example response:
#   The sky appears blue due to a phenomenon called Rayleigh scattering.
#   Sunlight, which contains all colors of the rainbow, is scattered
#   by the tiny particles in the Earth's atmosphere....
#   ...

What's next