You can use Vertex AI RAG Engine with several types of models. This page describes the following supported model types: The following table compares the supported model types. The following table lists the Gemini models and their versions that
support Vertex AI RAG Engine: Fine-tuned Gemini models are unsupported when the Gemini
models use Vertex AI RAG Engine. Vertex AI RAG Engine supports all models in
Model Garden. To use Vertex AI RAG Engine with a self-deployed model, create a model instance that points to your model's endpoint. Before you run the code sample, replace the following variables: The following models with managed APIs on Vertex AI support
Vertex AI RAG Engine: The following code sample demonstrates how to use the Gemini
Before you run the code sample, replace the following variables: The following code sample uses the OpenAI-compatible
Before you run the code sample, replace the following variables:
Model Type
Description
Use Case
Gemini models
Natively integrated, multimodal models developed by Google.
Best for general-purpose tasks, leveraging the latest features and optimizations from Google.
Self-deployed models
Open-source or custom models that you deploy and manage on your own Vertex AI endpoints.
Ideal for users who require full control over the model, architecture, and serving environment, or need to use a custom or fine-tuned model.
Models with managed APIs
Third-party models, such as Llama and Mistral, offered as fully managed API endpoints on Vertex AI.
Suitable for users who want to use popular third-party models without the overhead of deploying and managing the infrastructure.
Gemini models
Self-deployed models
PROJECT_ID
: Your project ID.LOCATION
: The region to process your request.ENDPOINT_ID
: Your endpoint ID. # Create a model instance with your self-deployed open model endpoint
rag_model = GenerativeModel(
"projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
tools=[rag_retrieval_tool]
)
Models with managed APIs on Vertex AI
GenerateContent
API to create a generative model instance. The model ID,
/publisher/meta/models/llama-3.1-405B-instruct-maas
, is found in the
model card.
PROJECT_ID
: Your project ID.LOCATION
: The region to process your request.RAG_RETRIEVAL_TOOL
: Your RAG retrieval tool. # Create a model instance with Llama 3.1 MaaS endpoint
rag_model = GenerativeModel(
"projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
tools=RAG_RETRIEVAL_TOOL
)
ChatCompletions
API to generate a model response.
PROJECT_ID
: Your project ID.LOCATION
: The region to process your request.MODEL_ID
: LLM model for content generation. For
example, meta/llama-3.1-405b-instruct-maas
.INPUT_PROMPT
: The text sent to the LLM for content
generation. Use a prompt relevant to the documents in
Vertex AI Search.RAG_CORPUS_ID
: The ID of the RAG corpus resource.ROLE
: Your role.USER
: Your username.CONTENT
: Your content. # Generate a response with Llama 3.1 MaaS endpoint
response = client.chat.completions.create(
model="MODEL_ID",
messages=[{"ROLE": "USER", "content": "CONTENT"}],
extra_body={
"extra_body": {
"google": {
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_ID"
},
"similarity_top_k": 10
}
}
}
},
)
What's next
Vertex AI RAG Engine supported models
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-23 UTC.