Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Grounding with Elasticsearch

This page explains how you can use your Elasticsearch instance to ground on your data.

Grounding Gemini with Elasticsearch

Grounding involves using public and private datasets to provide context and facts to ground Large Language Model (LLM) responses. By grounding with Elasticsearch, you can take advantage of your existing Elasticsearch indexes to help enhance the quality and reliability of Gemini's output, reducing hallucinations and helping to ensure responses are relevant to your data. This lets you build powerful RAG applications such as:

Generative search summaries
Question-and-answer chatbots with enterprise data
Agents grounded in your data

You can ground an answer on up to 10 data sources at one time. You can combine grounding with Elasticsearch with Grounding with Google Search to connect the model with world knowledge, a wide possible range of topics, or up-to-date information on the internet.

Supported models

The following models support grounding with Elasticsearch with text input only:

Gemini 2.5 Flash-Lite
Gemini 2.5 Flash with Live API native audio
Preview
Gemini 2.0 Flash with Live API
Preview
Gemini 2.5 Pro
Gemini 2.5 Flash
Gemini 2.0 Flash

Set up a search template in Elasticsearch

This section explains how you can use your Elasticsearch instance to ground on your data stored in the instance.

Best practices

For the best grounding responses, use these principles when creating a search template:

Include only relevant and useful data. For example, in a product catalog, specifying an image URL might not help the LLM answer prompts about product properties unless the prompt specifically asks for a URL. Similarly, avoid outputting embedding vectors.
Grounding removes Elasticsearch results with low relevance to your prompts. You should provide a higher number of Elasticsearch results to capture all relevant context.
Results data can be in one field or spread across multiple fields.

Sample templates

You can use your search templates. However, we recommend that you use the generic kNN search template with Elasticsearch grounding. For additional search templates, see the GitHub repository.

This semantic search with Vertex AI is a generic kNN search.

    PUT _scripts/google-template-knn-multioutput
    {
      "script": {
        "lang": "mustache",
        "source": {
          "_source": {
            "excludes": [ "title_embedding", "description_embedding", "images"]
          },
            "size": "num_hits",
              "knn" : [
              {
                "field": "description_embedding",
                "k": 5,
                "num_candidates": 10,
                "query_vector_builder": {
                  "text_embedding": {
                    "model_id": "googlevertexai_embeddings_004",
                    "model_text": "query"
                  }
                },
                "boost": 0.4
              },
              {
                "field": "title_embedding",
                "k": 5,
                "num_candidates": 10,
                "query_vector_builder": {
                  "text_embedding": {
                    "model_id": "googlevertexai_embeddings_004",
                    "model_text": "query"
                }
              },
              "boost": 0.6
              }
              ]
        }
      }
    }

Generate grounded responses with Elasticsearch

Console

To ground with Elasticsearch in the Google Cloud console, do the following:

Go to Vertex AI Studio to the Create prompt page.

Go to Create prompt
In the Settings panel, to ground your data, click the Grounding: Your data toggle.
In the Customize Grounding pane, select Elasticsearch.
Enter the endpoint in the Elasticsearch endpoint field.
Enter the API Key in the Elasticsearch API Key field.
Enter the index in the Elasticsearch index field.
Enter the search template in the Elasticsearch search template field.
Adjust the number of hits by sliding the Number of hits slider.
Click Save.
Enter your prompt.
Click Submit.

Understand your response

If your model prompt successfully grounds to Elasticsearch data stores using the Vertex AI Studio or the API, then the model's responses include metadata with citations and source content. If low-source relevance or incomplete information occurs within the model's response, then metadata might not be provided, and the prompt response won't be grounded.

REST

This section explains how you use the Vertex AI API to ground your LLM responses.

Prerequisites

Before you can ground LLM responses with Elasticsearch, you must complete the following:

Activate the Vertex AI API: Ensure that both the Vertex AI API is enabled for your Google Cloud project.
Install and sign in to the Google Cloud CLI CLI: Install and initialize the gcloud CLI command-line tool.
Elasticsearch setup: Use an existing Elasticsearch cluster and index that you want to use for grounding. Obtain the following information from your Elasticsearch setup:
- Endpoint: The URL of your Elasticsearch cluster.
- Index Name: The name of the index you want to search such as my-data-index.
- API Key: An API key that allows access to your Elasticsearch cluster. The API key must start with the prefix ApiKey.
Create an Elasticsearch search template: Use an Elasticsearch data source that uses a reference template that returns result data for grounding.

API access

Use the following instructions to ground Gemini with your Elasticsearch data source using the Vertex AI API.

Prepare a grounded generation request

To send a text prompt and ground it with Elasticsearch, send a POST request to the Vertex AI API. At a minimum, you must provide the request body. Make sure to do the following replacements:

QUERY: The text prompt to ground.
ELASTIC_SEARCH_ENDPOINT: The absolute endpoint path for the Elasticsearch resource to use.
ELASTIC_SEARCH_API_KEY: The API key for the Elasticsearch data endpoint.
INDEX_NAME: The name of the Elasticsearch index used for grounding.
SEARCH_TEMPLATE_NAME: The Elasticsearch search template used for grounding.
NUM_HITS: The number of results returned from the Elasticsearch data source and used for grounding.

Request JSON body:

    {
      "contents": [
        {
          "role": "user",
          "parts": [
            {
              "text": "QUERY"
            }
          ]
        }
      ],
      "tools": [{
        "retrieval": {
          "externalApi": {
            "api_spec": "ELASTIC_SEARCH",
            "endpoint": "ELASTIC_SEARCH_ENDPOINT",
            "apiAuth": {
              "apiKeyConfig": {
                "apiKeyString": "ApiKey ELASTIC_SEARCH_API_KEY"
              }
            },
            "elasticSearchParams": {
              "index": "INDEX_NAME",
              "searchTemplate": "SEARCH_TEMPLATE_NAME",
              "numHits": "NUM_HITS",
            }
          }
        }
      }]
    }

For more information on other API fields such as system instructions and multi-turn chats, see Generative AI beginner's guide.

Send the API request

You can save the request body in a file named request.json. Then execute the POST API request, and do the following replacements:

LOCATION: The region to process the request. For more information on available locations, see Generative AI on Vertex AI locations.
PROJECT_ID: Your Google Cloud project ID. For more information on project IDs, see Creating and managing projects.
MODEL_ID: The model ID of the multimodal model.

  curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json; charset=utf-8" \
      -d @request.json \
  "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"

You should receive a JSON response similar to the following:

  {
    "candidates": [
      {
        "content": {
          "role": "model",
          "parts": [
            {
              "text": "Based on the information ..."
            }
          ]
        },
        "finishReason": "STOP",
        "safetyRatings": [ "..." ],
        "groundingMetadata": {
          "groundingChunks": [
            {
              "retrievedContext": {
                "text": "ipsum lorem ..."
              }
            },
            {...},
            {...},
          ],
          "groundingSupports": [
            {
              "segment": {
                "startIndex": 25,
                "endIndex": 147,
                "text": "ipsum lorem ..."
              },
              "groundingChunkIndices": [1,2],
              "confidenceScores": [0.6626542, 0.82018316],
            },
          ],
        },
      }
    ],
  }

Understand your response

The response from both APIs include the LLM-generated text, which is called a candidate. If your model prompt successfully grounds to your Elasticsearch data source, then the responses include grounding metadata, which identifies the parts of the response that were derived from your Elasticsearch data. However, there are several reasons this metadata might not be provided, and the prompt response won't be grounded. These reasons include low-source relevance or incomplete information within the model's response.

The following is a breakdown of the output data:

Role: Indicates the sender of the grounded answer. Because the response always contains grounded text, the role is always model.
Text: The grounded answer generated by the LLM.
Grounding metadata: Information about the grounding source, which contains the following elements:
- Grounding chunks: A list of results from your Elasticsearch index that support the answer.
- Grounding supports: Information about a specific claim within the answer that can be used to show citations:
- Segment: The part of the model's answer that is substantiated by a grounding chunk.
- Grounding chunk index: The index of the grounding chunks in the grounding chunks list that corresponds to this claim.
- Confidence scores: A number from 0 to 1 that indicates how grounded the claim is in the provided set of grounding chunks. Not available for Gemini 2.5 Pro and Gemini 2.5 Flash and later.

What's next

To learn how to send chat prompt requests, see Multiturn chat.
To learn about responsible AI best practices and Vertex AI's safety filters, see Safety best practices.

Grounding with Elasticsearch Stay organized with collections Save and categorize content based on your preferences.

Grounding Gemini with Elasticsearch

Supported models

Set up a search template in Elasticsearch

Best practices

Sample templates

Generate grounded responses with Elasticsearch

Console

Understand your response

REST

Prerequisites

API access

Prepare a grounded generation request

Request JSON body:

Send the API request

Understand your response

What's next

Grounding with Elasticsearch