Filter website search

This page explains how to filter search queries for a search app with website data.

Before you begin

Make sure you have created an app and data store and have ingested website data into your data store. For more information, see Create a search app.

About filter expressions

Use filter expressions to construct your website search filters. How you construct your filters varies depending on whether you have turned on advanced website indexing. See one of the following sections, depending on whether you have standard or advanced website indexing:

Filter expressions with standard website indexing

This section explains filter expression behavior with standard website indexing (advanced website indexing is turned off).

Syntax for standard website indexing

The following Extended Backus–Naur form summarizes filter expression syntax for constructing a website search filter when you have standard website indexing. Double quotes after the colon in a filter are strictly enforced.

    # A single expression or multiple expressions that are joined by "AND".
  filter = expression, { "AND", expression };
  expression =
    # A simple expression applying to a text url string.
    | filter_key, ":", \"text_value\"
  filter_key = (cr | highRange | lowRange | fileType | lr | rights | siteSearch);
  text_value = string value to filter on;

Available fields for standard website indexing

Here are the fields that are available for filtering your website search when you have standard website indexing:

  • cr (string) Restricts search results to documents from a specific country. For a list of supported values, see Country Collection Values.

  • highRange (string) Specifies the upper limit of the search range. If a document contains a number, the number must be at or below the value of highRange for that document to be included in the response. Specify both highRange and lowRange to create a search query within the range of these parameters.

  • lowRange (string) Specifies the lower limit of the search range. If a document contains a number, the number must be at or above the value of lowRange for that document to be included in the response. Specify both lowRange and highRange to create a search query within the range of these parameters.

  • fileType (string) Restricts search results to documents with a specified extension. For a list of supported file types, see File types indexable by Google.

  • lr (string) Restricts search results to documents written in a specified language. For a list of supported languages, see Query parameters (lr).

  • rights (string) Filters search results based on licensing. For supported values, see Query parameters (rights).

  • siteSearch (string) Specifies A URL pattern for the web pages that your query should search through.

Examples for standard website indexing

Here are some filter examples for standard website indexing:

  • {"filter": "cr:\"countryUS\" AND siteSearch:\"https://example.com/example_domain\""}

    Filters for documents that are: (1) From the US, and (2) In the domain https://example.com/example_domain.

  • {"filter": "fileType:\".pdf\" AND lr:\"lang_en\""}

    Filters for documents that are: (1) PDF files, and (2) In English.

  • {"filter": "rights:\"cc_publicdomain\""}

    Filters for documents that are in the public domain.

Filter expressions with advanced website indexing

This section explains filter expression behavior with advanced website indexing (advanced website indexing is turned on).

Syntax for advanced website indexing

The following Extended Backus–Naur form summarizes filter expression syntax for constructing a website search filter when you have advanced website indexing. Filter expressions for advanced website search are limited to URL patterns. Wildcards are allowed and are denoted by the * character. Double quotes after the colon in a filter are strictly enforced.

    # A single expression or multiple expressions that are joined by "OR".
  filter = expression, { "OR", expression };
    # Expressions can be prefixed with "-" or "NOT" to express a negation.
  expression = [ "-" | "NOT " ],
    # A simple expression applying to a text url string.
    | filter_key, ":", \"url_string\"
  filter_key = siteSearch;
  url_string = double quoted string representing a URL;

Available fields for advanced website indexing

When you have advanced website indexing, you can filter your website search using these fields:

  • siteSearch (string): A URL pattern for the web pages that your query should search through.
  • meta tag names and PageMap Attribute names: Structured data from your web pages that can be added to your data store's schema to make the fields searchable, retrievable, and indexable. For more information, see Use structured data for advanced website indexing.

Examples for advanced website indexing

Here are some filter examples for advanced website indexing:

  • {"filter": "siteSearch:\"https://example.com/example_domain\""}

    Filters for documents that are in the domain https://example.com/example_domain. For example, https://example.com/example_domain/index.html.

  • {"filter": "siteSearch:\"https://example.com/subdomains/*\""}

    Filters for documents that are in any domains matching https://example.com/subdomains/*. For example, https://example.com/subdomains/example_subdomain_page.

  • {"filter": "siteSearch:\"https://altostrat.com/subdomain/pages/*\" OR siteSearch:\"http://cymbalgroup.com/pages/*\""}

    Filters for documents that are in any domains matching the first or second URL pattern. For example, https://altostrat.com/subdomain/pages/title_page, https://cymbalgroup.com/subdomain/pages/title_page, or https://altostrat.com/subdomain/pages/INFO.

For examples of meta tag names and PageMap Attribute names used for indexing, see Example use case for meta tags and Example use case for PageMaps.

To filter a website search, follow these steps:

  1. Find your app ID. If you already have your app ID, skip to the next step.

    1. In the Google Cloud console, go to the Agent Builder page.

      Go to Apps

    2. On the Apps page, find the name of your app and get the app's ID from the ID column.

  2. To filter a website search, use the filter field with the engines.servingConfigs.search method.

    curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
    -d '{
     "servingConfig": "projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search",
    "query": "QUERY",
    "filter": "FILTER"
    }'