Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Fine-tuned Gemini models are unsupported when the Gemini
models use Vertex AI RAG Engine.
Self-deployed models
Vertex AI RAG Engine supports all models in
Model Garden.
To use Vertex AI RAG Engine with a self-deployed model, create a model instance that points to your model's endpoint.
Before you run the code sample, replace the following variables:
PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
ENDPOINT_ID: Your endpoint ID.
# Create a model instance with your self-deployed open model endpointrag_model=GenerativeModel("projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",tools=[rag_retrieval_tool])
Models with managed APIs on Vertex AI
The following models with managed APIs on Vertex AI support
Vertex AI RAG Engine:
The following code sample demonstrates how to use the Gemini
GenerateContent API to create a generative model instance. The model ID,
/publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the
model card.
Before you run the code sample, replace the following variables:
PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.
# Create a model instance with Llama 3.1 MaaS endpointrag_model=GenerativeModel("projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",tools=RAG_RETRIEVAL_TOOL)
The following code sample uses the OpenAI-compatible
ChatCompletions API to generate a model response.
Before you run the code sample, replace the following variables:
PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
MODEL_ID: LLM model for content generation. For
example, meta/llama-3.1-405b-instruct-maas.
INPUT_PROMPT: The text sent to the LLM for content
generation. Use a prompt relevant to the documents in
Vertex AI Search.
RAG_CORPUS_ID: The ID of the RAG corpus resource.
ROLE: Your role.
USER: Your username.
CONTENT: Your content.
# Generate a response with Llama 3.1 MaaS endpointresponse=client.chat.completions.create(model="MODEL_ID",messages=[{"ROLE":"USER","content":"CONTENT"}],extra_body={"extra_body":{"google":{"vertex_rag_store":{"rag_resources":{"rag_corpus":"RAG_CORPUS_ID"},"similarity_top_k":10}}}},)
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-27 UTC."],[],[],null,["# Vertex AI RAG Engine supported models\n\n| The [VPC-SC security controls](/vertex-ai/generative-ai/docs/security-controls) and\n| CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't\n| supported.\n\nThis page lists Gemini models, self-deployed models, and models with\nmanaged APIs on Vertex AI that support Vertex AI RAG Engine.\n\nGemini models\n-------------\n\nThe following table lists the Gemini models and their versions that\nsupport Vertex AI RAG Engine:\n\n- [Gemini 2.5 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-lite)\n- [Gemini 2.5 Pro](/vertex-ai/generative-ai/docs/models/gemini/2-5-pro)\n- [Gemini 2.5 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)\n- [Gemini 2.0 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash)\n\nFine-tuned Gemini models are unsupported when the Gemini\nmodels use Vertex AI RAG Engine.\n\nSelf-deployed models\n--------------------\n\nVertex AI RAG Engine supports all models in\n[Model Garden](/vertex-ai/generative-ai/docs/model-garden/explore-models).\n\nUse Vertex AI RAG Engine with your self-deployed open model endpoints.\n\nReplace the variables used in the code sample:\n\n- **\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e**: Your project ID.\n- **\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e**: The region to process your request.\n- **\u003cvar translate=\"no\"\u003eENDPOINT_ID\u003c/var\u003e**: Your endpoint ID.\n\n # Create a model instance with your self-deployed open model endpoint\n rag_model = GenerativeModel(\n \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/endpoints/\u003cvar translate=\"no\"\u003eENDPOINT_ID\u003c/var\u003e\",\n tools=[rag_retrieval_tool]\n )\n\nModels with managed APIs on Vertex AI\n-------------------------------------\n\nThe models with managed APIs on Vertex AI that support\nVertex AI RAG Engine include the following:\n\n- [Mistral on Vertex AI](/vertex-ai/generative-ai/docs/partner-models/mistral)\n- [Llama 3.1 and 3.2](/vertex-ai/generative-ai/docs/partner-models/llama)\n\nThe following code sample demonstrates how to use the Gemini\n`GenerateContent` API to create a generative model instance. The model ID,\n`/publisher/meta/models/llama-3.1-405B-instruct-maas`, is found in the\n[model card](/vertex-ai/generative-ai/docs/model-garden/explore-models).\n\nReplace the variables used in the code sample:\n\n- **\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e**: Your project ID.\n- **\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e**: The region to process your request.\n- **\u003cvar translate=\"no\"\u003eRAG_RETRIEVAL_TOOL\u003c/var\u003e**: Your RAG retrieval tool.\n\n # Create a model instance with Llama 3.1 MaaS endpoint\n rag_model = GenerativeModel(\n \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/publisher/meta/models/llama-3.1-405B-instruct-maas\",\n tools=\u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eRAG_RETRIEVAL_TOOL\u003c/span\u003e\u003c/var\u003e\n )\n\nThe following code sample demonstrates how to use the OpenAI compatible\n`ChatCompletions` API to generate a model response.\n\nReplace the variables used in the code sample:\n\n- **\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e**: Your project ID.\n- **\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e**: The region to process your request.\n- **\u003cvar translate=\"no\"\u003eMODEL_ID\u003c/var\u003e** : LLM model for content generation. For example, `meta/llama-3.1-405b-instruct-maas`.\n- **\u003cvar translate=\"no\"\u003eINPUT_PROMPT\u003c/var\u003e**: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.\n- **\u003cvar translate=\"no\"\u003eRAG_CORPUS_ID\u003c/var\u003e**: The ID of the RAG corpus resource.\n- **\u003cvar translate=\"no\"\u003eROLE\u003c/var\u003e**: Your role.\n- **\u003cvar translate=\"no\"\u003eUSER\u003c/var\u003e**: Your username.\n- **\u003cvar translate=\"no\"\u003eCONTENT\u003c/var\u003e**: Your content.\n\n # Generate a response with Llama 3.1 MaaS endpoint\n response = client.chat.completions.create(\n model=\"\u003cvar translate=\"no\"\u003eMODEL_ID\u003c/var\u003e\",\n messages=[{\"\u003cvar translate=\"no\"\u003eROLE\u003c/var\u003e\": \"\u003cvar translate=\"no\"\u003eUSER\u003c/var\u003e\", \"content\": \"\u003cvar translate=\"no\"\u003eCONTENT\u003c/var\u003e\"}],\n extra_body={\n \"extra_body\": {\n \"google\": {\n \"vertex_rag_store\": {\n \"rag_resources\": {\n \"rag_corpus\": \"\u003cvar translate=\"no\"\u003eRAG_CORPUS_ID\u003c/var\u003e\"\n },\n \"similarity_top_k\": 10\n }\n }\n }\n },\n )\n\nWhat's next\n-----------\n\n- [Use Embedding models with Vertex AI RAG Engine](/vertex-ai/generative-ai/docs/use-embedding-models)."]]