This guide shows you how to use Vertex AI Search as a retrieval backend for Vertex AI RAG Engine. Vertex AI Search brings together deep information retrieval, natural-language processing, and the latest features in large language model (LLM) processing to help understand user intent and return the most relevant results. Using Vertex AI Search as your retrieval backend provides the following benefits for your Vertex AI RAG applications: To use Vertex AI Search as a retrieval backend, you first need to set up a search data store and application. After you set up your Vertex AI Search application, you can use it as the retrieval backend for your RAG application. These code samples show you how to configure Vertex AI Search as the retrieval backend for a RAG corpus. To use the command line to create a RAG corpus, do the following: Create a RAG corpus Replace the following variables used in the code sample: Monitor progress Replace the following variables used in the code sample:
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Before you begin
Use Vertex AI Search as a retrieval backend for Vertex AI RAG Engine
Create a RAG corpus
REST
projects/PROJECT_NUMBER/locations/LOCATION/collections/default_collection/engines/ENGINE_NAME/servingConfigs/default_search
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/ragCorpora" \
-d '{
"display_name" : "DISPLAY_NAME",
"vertex_ai_search_config" : {
"serving_config": "ENGINE_NAME/servingConfigs/default_search"
}
}'
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"
Python
Retrieve contexts using the RAG API
After you create the RAG corpus, you can retrieve relevant contexts from Vertex AI Search by using the RetrieveContexts
API.
REST
This code sample demonstrates how to retrieve contexts using REST.
Replace the following variables used in the code sample:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.
- TEXT: The query text to get relevant contexts.
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
}
},
"query": {
"text": "TEXT"
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Generate content using the Vertex AI Gemini API
To generate content with Gemini models, call the Vertex AI GenerateContent
API. When you specify the RAG_CORPUS_RESOURCE
in the request, the API automatically retrieves data from Vertex AI Search.
REST
To generate content using Gemini models, make a call to the
Vertex AI GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
in the request, it automatically retrieves data from
Vertex AI Search.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_ID: LLM model for content generation. For
example,
gemini-2.0-flash
. - GENERATION_METHOD: LLM method for content generation.
For example,
generateContent
,streamGenerateContent
. - INPUT_PROMPT: The text that is sent to the LLM for content generation. Try to use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource. Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD" \ -d '{ "contents": { "role": "user", "parts": { "text": "INPUT_PROMPT" } }, "tools": { "retrieval": { "disable_attribution": false, "vertex_rag_store": { "rag_resources": { "rag_corpus": "RAG_CORPUS_RESOURCE" }, "similarity_top_k": SIMILARITY_TOP_K } } } }'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.