Retrieval-augmented generation (RAG) is a technique that's used to retrieve and provide relevant information to LLMs to generate verifiable responses. The information can include fresh information, a topic and context, or ground truth.
This page shows you how to use Vertex AI RAG Engine with the multimodal Live API, which lets you specify and retrieve information from the RAG corpus.
Prerequisites
The following prerequisites must be completed before you can use Vertex AI RAG Engine with the multimodal Live API:
Enable the RAG API in Vertex AI.
To upload files to the RAG Corpus, see Import RAG files example API.
Set up
You can use Vertex AI RAG Engine with the Live API by specifying Vertex AI RAG Engine as a tool. The following code sample demonstrates how to specify Vertex AI RAG Engine as a tool:
Replace the following variables:
- YOUR_PROJECT_ID: The ID of your Google Cloud project.
- YOUR_CORPUS_ID: The ID of your corpus.
- YOUR_LOCATION: The region to process the request.
PROJECT_ID = "YOUR_PROJECT_ID"
RAG_CORPUS_ID = "YOUR_CORPUS_ID"
LOCATION = "YOUR_LOCATION"
TOOLS = {
"retrieval": {
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora/${RAG_CORPUS_ID}"
}
}
}
Use Websocket
for real-time communication
To enable real-time communication between a client and a server, you must use a
Websocket
. These code samples demonstrate how to use a Websocket
using the
Python API and the Python SDK.
Python API
CONFIG = {"response_modalities": ["TEXT"], "speech_config": { "language_code": "en-US" }}
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {bearer_token[0]}",
}
HOST= "${LOCATION}-aiplatform.googleapis.com"
SERVICE_URL = f"wss://{HOST}/ws/google.cloud.aiplatform.v1beta1.LlmBidiService/BidiGenerateContent"
MODEL="gemini-2.0-flash-exp"
# Connect to the server
async with connect(SERVICE_URL, additional_headers=headers) as ws:
# Setup the session
await ws.send(
json.dumps(
{
"setup": {
"model": MODEL,
"generation_config": CONFIG,
# Setup RAG as a retrieval tool
"tools": TOOLS,
}
}
)
)
# Receive setup response
raw_response = await ws.recv(decode=False)
setup_response = json.loads(raw_response.decode("ascii"))
# Send text message
text_input = "What are popular LLMs?"
display(Markdown(f"**Input:** {text_input}"))
msg = {
"client_content": {
"turns": [{"role": "user", "parts": [{"text": text_input}]}],
"turn_complete": True,
}
}
await ws.send(json.dumps(msg))
responses = []
# Receive chunks of server response
async for raw_response in ws:
response = json.loads(raw_response.decode())
server_content = response.pop("serverContent", None)
if server_content is None:
break
model_turn = server_content.pop("modelTurn", None)
if model_turn is not None:
parts = model_turn.pop("parts", None)
if parts is not None:
display(Markdown(f"**parts >** {parts}"))
responses.append(parts[0]["text"])
# End of turn
turn_complete = server_content.pop("turnComplete", None)
if turn_complete:
grounding_metadata = server_content.pop("groundingMetadata", None)
if grounding_metadata is not None:
grounding_chunks = grounding_metadata.pop("groundingChunks", None)
if grounding_chunks is not None:
for chunk in grounding_chunks:
display(Markdown(f"**grounding_chunk >** {chunk}"))
break
# Print the server response
display(Markdown(f"**Response >** {''.join(responses)}"))
Python SDK
To learn how to install the generative AI SDK, see Install a library:
from google import genai
from google.genai import types
from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part,)
from IPython import display
MODEL="gemini-2.0-flash-exp"
client = genai.Client(
vertexai=True,
project=PROJECT_ID,
location=LOCATION
)
async with client.aio.live.connect(
model=MODEL,
config=LiveConnectConfig(response_modalities=[Modality.TEXT],
tools=TOOLS),
) as session:
text_input = "\'What are core LLM techniques?\'"
print("> ", text_input, "\n")
await session.send_client_content(
turns=Content(role="user", parts=[Part(text=text_input)])
)
async for message in session.receive()
if message.text:
display.display(display.Markdown(message.text))
continue
What's next
- To learn more about Vertex AI RAG Engine, see Vertex AI RAG Engine overview.
- To learn more about the RAG API, see Vertex AI RAG Engine API.
- To manage your RAG corpora, see Corpus management.
- To manage your RAG files, see File management.
- To learn how to use the Vertex AI SDK to run
Vertex AI RAG Engine tasks, see RAG quickstart for
Python.