Query overview

This page describes the SEARCH function and the enhanced query mode, which are used to perform full-text search queries on Spanner tables.

Query a search index

Spanner provides the SEARCH function to use for search index queries. An example use case would be an application where users enter text in a search box and the application sends the user input directly into the SEARCH function. The 'SEARCH' function would then use a search index to find that text.

The SEARCH function requires two arguments:

  • A search index name
  • A search query

The SEARCH function only works when a search index is defined. The SEARCH function can be combined with any arbitrary SQL constructs, such as filters, aggregations, or joins.

The SEARCH function can't be used with transaction queries.

The following query uses the SEARCH function to return all albums that have either friday or monday in the title:

SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'friday OR monday')

Search query

Search query uses raw search query syntax by default. Alternative syntaxes can be specified using the dialect argument.

rquery dialect

The default dialect is raw search query. Spanner uses a domain-specific language (DSL) called rquery.

The rquery language follows the same rules as the plain-text tokenizer when splitting the input search query into distinct terms. This includes segmentation of Asian languages.

For information about using rquery, see rquery syntax.

words dialect

The words dialect is like rquery, but simpler. It doesn't use any special operators. For example, OR is treated as a search term instead of a disjunction operator. The double quotes are handled as punctuations rather than a phrase search and they are ignored.

With the words dialect, AND is implicitly applied to all terms, and is required during matching. It follows the same rules as the plain-text tokenizer when splitting the input search query into terms.

For information about using the words dialect, see words syntax.

words_phrase dialect

The words_phrase dialect doesn't use any special operators and all terms are treated as a phrase, meaning the terms are required to be adjacent and in the order specified.

Same as rquery, the words_phrase dialect follows the same rules as the plain-text tokenizer when splitting the input search query into terms.

For information about using the words_phrase dialect, see words phrase syntax.

Enhanced query mode

Spanner offers two full-text search modes: a basic token-based search and a more advanced mode called enhance_query. When enabled, enhance_query expands the search query to include related terms and synonyms, increasing the likelihood of finding relevant results.

To enable this option, set the optional argument enhance_query=>true in the SEARCH function. For example, the search query hotl cal matches the album Hotel California.

SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'hotl cal', enhance_query=>true)

The enhance_query mode is a query-time option. It doesn't affect tokenization. You can use the same search index with or without enhance_query.

Google is continuously improving the query enhancement algorithms. As a result, a query with enhance_query == true might yield slightly different results over time.

When the enhance_query mode is enabled, it might increase the number of terms that the SEARCH function is looking for which could slightly elevate latency.

For example, the following query uses a three-second timeout and fails if enhance_query is unavailable:

@{require_enhance_query=true, enhance_query_timeout_ms=3000}
SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'fast car', enhance_query=>true)

For more information about using the enhance_query option, see the SEARCH function.

SQL query requirements

There are several conditions that a SQL query must meet to use a search index. If these conditions aren't met, the query uses either an alternative query plan or fails if no alternative plan exists.

Queries must meet the following conditions:

  • SEARCH and SEARCH_SUBSTRING functions require a search index. Spanner doesn't support these functions in queries against the base table or secondary indexes.

  • Partitioned indexes must have all partition columns bound by an equality condition in the WHERE clause of the query.

    For example, if a search index is defined as PARTITION BY x, y, the query must have a conjunct in the WHERE clause of x = <parameter or constant> AND y = <parameter or constant>. That search index isn't considered by the query optimizer if such a condition is missing.

  • All TOKENLIST columns referenced by SEARCH and SEARCH_SUBSTRING operators must be indexed in the same search index.

    For example, consider the following table and index definition:

    CREATE TABLE Albums (
        AlbumId STRING(MAX) NOT NULL,
        AlbumTitle STRING(MAX),
        AlbumStudio STRING(MAX),
        AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN,
        AlbumStudio_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumStudio)) HIDDEN
      ) PRIMARY KEY(AlbumId);
    
      CREATE SEARCH INDEX AlbumsTitleIndex ON Albums(AlbumTitle_Tokens);
      CREATE SEARCH INDEX AlbumsStudioIndex ON Albums(AlbumStudio_Tokens);
    

    The following query fails because there's no single search index that indexes both AlbumTitle_Tokens and AlbumStudio_Tokens:

    SELECT AlbumId
    FROM Albums
    WHERE SEARCH(AlbumTitle_Tokens, @p1) AND SEARCH(AlbumStudio_Tokens, @p2)
    
  • If the sort order column is nullable, both the schema and the query must exclude rows where the sort order column is NULL. For details, see Search index sort order.

  • If the search index is NULL filtered, the query must include the same NULL-filtering expression that's used in an index. See NULL-filtered search indexes for details.

  • Search indexes and search functions aren't supported in DML, partitioned DML, or partitioned queries.

  • Search indexes and search functions are typically used in read-only transactions. If application requirements allow stale results, we recommend running search queries with a staleness duration of 10 seconds or longer. For more information, see Read stale data. This is particularly useful for large search queries that fan out to multiple Paxos groups.

    Search indexes and search functions aren't recommended in read-write transactions. During execution, search queries lock an entire index partition; as a result, a high rate of search queries in read-write transactions might cause lock conflicts leading to latency spikes. By default, search indexes aren't automatically selected in read-write transactions. If a query is forced to use a search index in a read-write transaction it fails by default. It will also fail if the query contains any of the search functions. This behavior can be overridden with the @{ALLOW_SEARCH_INDEXES_IN_TRANSACTION=TRUE} statement-level hint (but queries are still prone to lock conflicts).

Once index eligibility conditions are met, the query optimizer tries to accelerate non-text query conditions (like Rating > 4). If the search index doesn't include the appropriate TOKENLIST column, the condition isn't accelerated and remains a residual condition.

Query parameters

Search query arguments are specified as either a literal or a query parameter. We recommend using query parameters for full-text search rather than string literals when arguments allow query parameter value.

Index selection

Spanner typically selects the most efficient index for a query using cost-based modeling. However, the FORCE_INDEX hint explicitly instructs Spanner to use a specific search index. For example, the following shows how to force Spanner to use the AlbumsIndex:

SELECT AlbumId
FROM Albums @{FORCE_INDEX=AlbumsIndex}
WHERE SEARCH(AlbumTitle_Tokens, "fifth symphony")

If the specified search index isn't eligible, the query fails, even if there are other eligible search indexes.

Snippets in search results

A snippet is a piece of text extracted from a given string that gives users a sense of what a search result contains, and the reason why the result is relevant to their query.

For example, Gmail uses snippets to indicate the portion of an email that matches the search query:

List of snippets

Having the database generate a snippet has several benefits:

  1. Convenience: You don't need to implement logic to generate snippets from a search query.
  2. Efficiency: Snippets reduce the output size from the server.

The SNIPPET function creates the snippet. It returns the relevant portion of the original string value along with positions of characters to highlight. The client can then choose how to display the snippet to the end user (for example, using highlighted or bold text). The SNIPPET function removes all HTML tags from the original string.

For example, the following uses SNIPPET to retrieve text from AlbumTitle:

SELECT AlbumId, SNIPPET(AlbumTitle, "Fast Car")
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, "Fast Car")

What's next