Filter with natural-language understanding

This page explains how to apply natural-language understanding to automatically make filters for search queries and, therefore, to improve the quality of the results returned.

You can use this feature with search apps that are connected to structured data stores.

About natural-language query understanding

If you have a generic search app with structured data, your users' natural-language queries can be reformatted as filtered queries. This can lead to better quality search results than searching for words in the query string.

Using natural-language query understanding is easier and more flexible than writing your own filter expressions. For information about writing filter expressions, see Filter generic search for structured or unstructured data.

This feature is best explained through examples:

Example: Field extraction from queries

This natural-language query understanding feature is explained through the example of searching for a hotel.

Take the following query made to a structured data store for a hotel site: "Find me a family-friendly hotel with at least four stars that costs less than 300 a night, lets me bring my dog, and has free Wi-Fi."

Without natural-language query understanding, the search app looks for documents that contain the words in the query.

With natural-language query understanding and appropriately structured data, the search is made more effective by replacing some of the natural language in the query with filters. If the structured data has fields for star_rating (numbers), price (numbers), and amenities (strings), then the query can be rewritten as the following filters plus a residual query:

  • Filters extracted from the natural-language query:

    {
        "star_rating": ≥4,
        "price": ≤300,
        "amenities": "Wifi", "Pets Allowed"
    }
    
  • Residual query, reformulated after filters are extracted:

    family-friendly

Example: With a geolocation filter

This example is similar to the preceding one except that it includes a geolocation filter, which is special kind of extracted filter. Vertex AI Search has the ability to recognize locations in a query and create proximity filters for the locations.

Take the following query made to a state-wide business site: "Find me a chic and stylish hotel with at least 4 stars that is in San Francisco."

With natural-language query understanding and the geolocation filter, the search is reformulated as filters and a residual query:

  • Filters extracted from the natural-language query, for at least a 4-star rating and within a 10 km radius of San Francisco:

    {
        "star_rating": ≥4,
        "location": GEO_DISTANCE(\"San Francisco, CA\", 10000)
    }
    

    In this example, the GEO_DISTANCE is an address, but in other queries, it might be written as a latitude and longitude, even though the original query contained an address.

  • Residual query, reformulated after filters are extracted:

    chic and stylish in San Francisco

    Although the geolocation filter is made, the place name remains in the residual query. This is different from the other filters, such as the star_rating.

Limitations

The following limitations apply to natural-language query understanding:

  • Natural-language query understanding can't be applied to blended search apps. You get an error if you try to use natural-language query understanding with a blended search app.

  • Natural-language query understanding works for any generic search app that uses structured data. However, Google doesn't recommend using this feature with the following kinds of structured data:

    • Unstructured data with metadata
    • Website data with structured data

    This feature is unlikely to produce good results when applied to these kinds of structured data because these data usually don't contain content, but instead contain information about the nature of the content.

  • Using natural-language query understanding increases latency, so you might choose not to use it if latency is a problem.

  • For geolocation, the location must be explicitly described. You can't use locations such as "near me" or "home".

  • The radius for geolocation is 10 km and isn't configurable.

Before you begin

Before you start using natural-language query understanding, you have to enable it for data stores connected to the apps that you plan to use.

To enable natural-language query understanding, follow these steps:

REST

  1. Find your data store ID. If you already have your data store ID, skip to the next step.

    1. In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Data Stores.

      Go to the Data Stores page

    2. Click the name of your data store.

    3. On the Data page for your data store, get the data store ID.

  2. Run the following curl command:

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID?update_mask=natural_language_query_understanding_config.mode" \
    -d '{
          "naturalLanguageQueryUnderstandingConfig": {
            "mode": "ENABLED"
          }
        }'
    
    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
  3. Repeat steps 1 and 2 for each data store.

  4. Wait approximately 24 hours.

    If you try to use natural-language query understanding before the data store is ready, the response you get is the same as if filterExtractionCondition was set to DISABLED.

Search, converting natural-language queries into filters

To search on a query in natural language and get results that are optimized for natural-language queries, do the following:

REST

  1. Run the following curl command, which calls the search method:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
      -d '{
            "query": "QUERY",
            "naturalLanguageQueryUnderstandingSpec": {
              "filterExtractionCondition": "ENABLED"
            }
          }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: the ID of the Vertex AI Search app that you want to query. The app must be connected to a data store that contains structured data or metadata. The app can't be a blended search app.
    • QUERY: the query is in written in a natural language.

Search, converting locations in queries to geolocation filters

To search on a query in natural language and get results that are optimized for natural-language queries including proximity to locations, do the following:

REST

  1. Run the following curl command, which calls the search method:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
      -d '{
            "query": "QUERY",
            "naturalLanguageQueryUnderstandingSpec": {
              "filterExtractionCondition": "ENABLED",
              "geoSearchQueryDetectionFieldNames": "GEO_FIELD_NAMES"
            }
          }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: the ID of the Vertex AI Search app that you want to query. The app must be connected to a data store that contains structured data or metadata. The app can't be a blended search app.
    • QUERY: the query is in written in a natural language.
    • GEO_FIELD_NAMES: a comma-separated list of values of type geolocation. If the value type isn't geolocation, then this field is ignored.