Mix full-text and non-text queries

This page describes how to perform a search that mixes full-text and non-text data.

Search indexes support full-text, exact match, and numeric columns. You can combine text and non-text conditions in the WHERE clause similarly to multi-column search queries. The query optimizer tries to optimize non-text predicates with a search index. If that's impossible, Spanner evaluates the condition for every row that matches the search index. Referenced columns not stored in the search index are fetched from the base table.

For example, consider the following example:

CREATE TABLE Albums (
  AlbumId STRING(MAX) NOT NULL,
  Title STRING(MAX),
  Rating FLOAT64,
  Genres ARRAY<STRING(MAX)>,
  Likes INT64,
  Cover BYTES(MAX),
  Title_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
  Rating_Tokens TOKENLIST AS (TOKENIZE_NUMBER(Rating)) HIDDEN,
  Genres_Tokens TOKENLIST AS (TOKEN(Genres)) HIDDEN
) PRIMARY KEY(AlbumId);

CREATE SEARCH INDEX AlbumsIndex
ON Albums(Title_Tokens, Rating_Tokens, Genres_Tokens)
STORING Likes;

The behavior of queries on this table include the following:

  • Rating and Genres are included in the search index. Spanner accelerates conditions using search index posting lists.

    SELECT Album
    FROM Albums
    WHERE Rating > 4
      AND ARRAY_INCLUDES_ANY(Genres, ['jazz'])
    
  • The query can combine conjunctions, disjunctions, and negations in any way, including mixing full-text and non-text predicates. This query is fully accelerated by the search index.

    SELECT Album
    FROM Albums
    WHERE (SEARCH(Title_Tokens, 'car')
           OR Rating > 4)
      AND NOT ARRAY_INCLUDES_ANY(Genres, ['jazz'])
    
  • Likes is stored in the index, but the schema doesn't request Spanner to build a token index for its possible values. Therefore, the full-text predicate on Title and non-text predicate on Rating is accelerated, but the predicate on Likes isn't. In Spanner, the query fetches all documents with the term "car" in the Title and a rating more than 4, then it filters documents that don't have at least 1000 likes. This query uses a lot of resources if almost all albums have the term "car" in their title and almost all of them have a rating of 5, but few albums have 1000 likes. In such cases, indexing Likes similarly to Rating saves resources.

    SELECT Album
    FROM Albums
    WHERE SEARCH(Title_Tokens, 'car')
      AND Rating > 4
      AND Likes >= 1000
    
  • Cover isn't stored in the index. The following query does a back join between AlbumsIndex and Albums to fetch Cover for all matching albums.

    SELECT AlbumId, Cover
    FROM Albums
    WHERE SEARCH(Title_Tokens, 'car')
      AND Rating > 4
    

What's next