The page explains reranking and types of rankers. The page also demonstrates how
to use the Vertex AI ranking API to rerank your retrieved responses. To use the Vertex AI ranking API, you must enable the
Discovery Engine API. All supported models can be
found in the Improve search and RAG quality with ranking
API. These code samples demonstrate how to enable reranking with the
Vertex AI ranking API in the tool configuration. To learn how to install or update the Vertex AI SDK for Python, see
Install the Vertex AI SDK for
Python. For more information,
see the Python API reference
documentation. Replace the following variables used in the sample code: To generate content using Gemini models, make a call to the
Vertex AI Replace the following variables used in the sample code: This section presents the prerequisites and code samples for using an LLM
reranker. The LLM reranker supports only Gemini models, which are accessible when
the Vertex AI RAG Engine API is enabled. To view the list of supported
models, see Gemini
models. To retrieve relevant contexts using the Vertex AI RAG Engine API,
do the following: To learn how to install or update the Vertex AI SDK for Python, see
Install the Vertex AI SDK for
Python. For more information,
see the Python API reference
documentation. Replace the following variables used in the code sample: Replace the following variables used in the code sample:Available rerankers
Ranker options
Description
Latency
Accuracy
Pricing
Vertex AI ranking API
The Vertex AI ranking API is a standalone semantic reranker designed for highly-precise relevance scoring and low latency.
For more information about Vertex AI ranking API, see Improve search and RAG quality with
ranking API.Very low (less than 100 milliseconds)
State-of-the-art performance
Per Vertex AI RAG Engine request
LLM reranker
LLM reranker uses a separate call to Gemini to assess relevance of chunks to a query.
High (1 to 2 seconds)
Model dependent
LLM token pricing
Use the Vertex AI ranking API
Python
gemini-2.0-flash
.
Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
.semantic-ranker-default@latest
.from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
config = rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
rank_service=rag.RankService(
model_name=RANKER_MODEL_NAME
)
)
)
rag_retrieval_tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
rag_retrieval_config=config
),
)
)
rag_model = GenerativeModel(
model_name="MODEL_NAME", tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("INPUT_PROMPT")
print(response.text)
# Example response:
# The sky appears blue due to a phenomenon called Rayleigh scattering.
# Sunlight, which contains all colors of the rainbow, is scattered
# by the tiny particles in the Earth's atmosphere....
# ...
REST
GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
when you make the request, the model automatically retrieves data
from the Vertex AI RAG Engine.
gemini-2.0-flash
.generateContent
and streamGenerateContent
.
Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
.semantic-ranker-default@latest
.curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_NAME:GENERATION_METHOD" \
-d '{
"contents": {
"role": "user",
"parts": {
"text": "INPUT_PROMPT"
}
},
"tools": {
"retrieval": {
"disable_attribution": false,
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
},
"rag_retrieval_config": {
"top_k": SIMILARITY_TOP_K,
"ranking": {
"rank_service": {
"model_name": "RANKER_MODEL_NAME"
}
}
}
}
}
}
}'
Use the LLM reranker in Vertex AI RAG Engine
Python
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
.from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/[PROJECT_ID]/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
MODEL_NAME= "MODEL_NAME"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
rag_retrieval_config = rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
llm_ranker=rag.LlmRanker(
model_name=MODEL_NAME
)
)
)
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
text="TEXT",
rag_retrieval_config=rag_retrieval_config,
)
print(response)
# Example response:
# contexts {
# contexts {
# source_uri: "gs://your-bucket-name/file.txt"
# text: "....
# ....
REST
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
.curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
}
},
"query": {
"text": "TEXT",
"rag_retrieval_config": {
"top_k": 10,
"ranking": {
"llm_ranker": {
"model_name": "MODEL_NAME"
}
}
}
}
}'
What's next
Reranking for Vertex AI RAG Engine
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-26 UTC.