Stay organized with collections
Save and categorize content based on your preferences.
This page describes how to use a fuzzy search as part of a
full-text search.
In addition to performing exact token searches using the
SEARCH
and
SEARCH_SUBSTRING
functions, Spanner also supports approximate (or fuzzy) searches. Fuzzy
searches find matching documents despite small differences between the query and
the document.
Spanner supports the following types of fuzzy search:
N-grams-based fuzzy search relies on the same
substring tokenization that a
substring search
requires. The configuration of the tokenizer is important as it affects
search quality and performance. The following example shows how to create a
query with misspelled or differently spelled words to find approximate matches
in the search index.
Optimize performance and recall for an n-grams-based approximate search
The sample query in the previous section searches in two phases, using two
different functions:
SEARCH_NGRAMS
finds all candidate albums that have shared n-grams with the search query.
For example, three-character n-grams for "California" include [cal, ali,
lif, ifo, for, orn, rni, nia] and for "Kaliphorn" include [kal, ali, lip,
iph, pho, hor, orn]. The shared n-grams in these data sets are [ali,
orn]. By default, SEARCH_NGRAMS matches all documents with at least two
shared n-grams, therefore "Kaliphorn" matches "California".
SCORE_NGRAMS
ranks matches by similarity. The similarity of two strings is defined as a
ratio of distinct shared n-grams to distinct non-shared n-grams:
Usually the search query is the same across both the SEARCH_NGRAMS and
SCORE_NGRAMS functions. The recommended way to do this is to use the argument
with
query parameters
rather than with string literals, and specify the same query parameter in the
SEARCH_NGRAMS and SCORE_NGRAMS functions.
Spanner has three configuration arguments that can be used with
SEARCH_NGRAMS:
The minimum and maximum sizes for n-grams are specified with the
TOKENIZE_SUBSTRING(/spanner/docs/reference/standard-sql/search_functions#tokenize_substring) or
TOKENIZE_NGRAMS
functions. We don't recommend one character n-grams because they could match
a very large number of documents. On the other hand, long n-grams cause
SEARCH_NGRAMS to miss short misspelled words.
The minimum number of n-grams that SEARCH_NGRAMS must match (set with the
min_ngrams and min_ngrams_percent arguments in SEARCH_NGRAMS). Higher
numbers typically make the query faster, but reduce recall.
In order to achieve a good balance between performance and recall, you can
configure these arguments to fit the specific query and workload.
We also recommend including an inner LIMIT to avoid creating very expensive
queries when a combination of popular n-grams is encountered.
N-grams-based fuzzy search versus enhanced query mode
Alongside n-grams-based fuzzy search, the
enhanced query mode
also handles some misspelled words. Thus, there is some overlap between the two
features. The following table summarizes the differences:
n-grams-based fuzzy search
Enhanced query mode
Cost
Requires a more expensive substring tokenization based on
n-grams
Requires a less expensive full-text tokenization
Search query types
Works well with short documents with a few words,
such as with a person name, city name, or product name
Works equally well with any size documents and any size search queries
Partial words search
Performs a substring search that allows for misspellings
Only supports a search for entire words (SEARCH_SUBSTRING
doesn't support the enhance_query argument)
Misspelled words
Supports misspelled words in either index or query
Only supports misspelled words in the query
Corrections
Finds any misspelled matches, even if the match isn't a real word
Corrects misspellings for common, well-known words
Perform a phonetic search with Soundex
Spanner provides the
SOUNDEX
function for finding words that are spelled differently, but sound the same. For
example, SOUNDEX("steven"), SOUNDEX("stephen") andSOUNDEX("stefan") are
all "s315", while SOUNDEX("stella") is "s340". SOUNDEX is case sensitive and
only works for Latin-based alphabets.
Phonetic search with SOUNDEX can be implemented with a generated column and a
search index as shown in the following example:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Find approximate matches with fuzzy search\n\n\u003cbr /\u003e\n\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page describes how to use a fuzzy search as part of a\n[full-text search](/spanner/docs/full-text-search).\n\nIn addition to performing exact token searches using the\n[`SEARCH`](/spanner/docs/reference/standard-sql/search_functions#search_fulltext)\nand\n[`SEARCH_SUBSTRING`](/spanner/docs/reference/standard-sql/search_functions#search_substring)\nfunctions, Spanner also supports approximate (or fuzzy) searches. Fuzzy\nsearches find matching documents despite small differences between the query and\nthe document.\n\nSpanner supports the following types of fuzzy search:\n\n- N-grams-based approximate search\n- Phonetic search using [Soundex](https://en.wikipedia.org/wiki/Soundex)\n\nUse an n-grams-based approximate search\n---------------------------------------\n\nN-grams-based fuzzy search relies on the same\nsubstring tokenization that a\n[substring search](/spanner/docs/full-text-search/substring-search)\nrequires. The configuration of the tokenizer is important as it affects\nsearch quality and performance. The following example shows how to create a\nquery with misspelled or differently spelled words to find approximate matches\nin the search index.\n\n**Schema** \n\n### GoogleSQL\n\n CREATE TABLE Albums (\n AlbumId STRING(MAX) NOT NULL,\n AlbumTitle STRING(MAX),\n AlbumTitle_Tokens TOKENLIST AS (\n TOKENIZE_SUBSTRING(AlbumTitle, ngram_size_min=\u003e2, ngram_size_max=\u003e3,\n relative_search_types=\u003e[\"word_prefix\", \"word_suffix\"])) HIDDEN\n ) PRIMARY KEY(AlbumId);\n\n CREATE SEARCH INDEX AlbumsIndex\n ON Albums(AlbumTitle_Tokens)\n STORING (AlbumTitle);\n\n### PostgreSQL\n\nThis example uses\n[`spanner.tokenize_substring`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions). \n\n CREATE TABLE albums (\n albumid character varying NOT NULL,\n albumtitle character varying,\n albumtitle_tokens spanner.tokenlist GENERATED ALWAYS AS (\n spanner.tokenize_substring(albumtitle, ngram_size_min=\u003e2, ngram_size_max=\u003e3,\n relative_search_types=\u003e'{word_prefix, word_suffix}'::text[])) VIRTUAL HIDDEN,\n PRIMARY KEY(albumid));\n\n CREATE SEARCH INDEX albumsindex\n ON albums(albumtitle_tokens)\n INCLUDE (albumtitle);\n\n**Query**\n\nThe following query finds the albums with titles that are the closest to\n\"Hatel Kaliphorn\", such as \"Hotel California\". \n\n### GoogleSQL\n\n SELECT AlbumId\n FROM Albums\n WHERE SEARCH_NGRAMS(AlbumTitle_Tokens, \"Hatel Kaliphorn\")\n ORDER BY SCORE_NGRAMS(AlbumTitle_Tokens, \"Hatel Kaliphorn\") DESC\n LIMIT 10\n\n### PostgreSQL\n\nThis examples uses\n[`spanner.score_ngrams`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions)\nand\n[`spanner.search_ngrams`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions). \n\n SELECT albumid\n FROM albums\n WHERE spanner.search_ngrams(albumtitle_tokens, 'Hatel Kaliphorn')\n ORDER BY spanner.score_ngrams(albumtitle_tokens, 'Hatel Kaliphorn') DESC\n LIMIT 10\n\n### Optimize performance and recall for an n-grams-based approximate search\n\nThe sample query in the previous section searches in two phases, using two\ndifferent functions:\n\n1. [`SEARCH_NGRAMS`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions) finds all candidate albums that have shared n-grams with the search query. For example, three-character n-grams for \"California\" include `[cal, ali,\n lif, ifo, for, orn, rni, nia]` and for \"Kaliphorn\" include `[kal, ali, lip,\n iph, pho, hor, orn]`. The shared n-grams in these data sets are `[ali,\n orn]`. By default, `SEARCH_NGRAMS` matches all documents with at least two shared n-grams, therefore \"Kaliphorn\" matches \"California\".\n2. [`SCORE_NGRAMS`](/spanner/docs/reference/standard-sql/search_functions#score_ngrams) ranks matches by similarity. The similarity of two strings is defined as a ratio of distinct shared n-grams to distinct non-shared n-grams:\n\n$$ \\\\frac{shared\\\\_ngrams}{total\\\\_ngrams_{index} + total\\\\_ngrams_{query} - shared\\\\_ngrams} $$\n\nUsually the search query is the same across both the `SEARCH_NGRAMS` and\n`SCORE_NGRAMS` functions. The recommended way to do this is to use the argument\nwith\n[query parameters](/spanner/docs/reference/standard-sql/lexical#query_parameters)\nrather than with string literals, and specify the same query parameter in the\n`SEARCH_NGRAMS` and `SCORE_NGRAMS` functions.\n\nSpanner has three configuration arguments that can be used with\n`SEARCH_NGRAMS`:\n\n- The minimum and maximum sizes for n-grams are specified with the `TOKENIZE_SUBSTRING`(/spanner/docs/reference/standard-sql/search_functions#tokenize_substring) or [`TOKENIZE_NGRAMS`](/spanner/docs/reference/standard-sql/search_functions#tokenize_ngrams) functions. We don't recommend one character n-grams because they could match a very large number of documents. On the other hand, long n-grams cause `SEARCH_NGRAMS` to miss short misspelled words.\n- The minimum number of n-grams that `SEARCH_NGRAMS` must match (set with the `min_ngrams` and `min_ngrams_percent` arguments in `SEARCH_NGRAMS`). Higher numbers typically make the query faster, but reduce recall.\n\nIn order to achieve a good balance between performance and recall, you can\nconfigure these arguments to fit the specific query and workload.\n\nWe also recommend including an inner `LIMIT` to avoid creating very expensive\nqueries when a combination of popular n-grams is encountered. \n\n### GoogleSQL\n\n SELECT AlbumId\n FROM (\n SELECT AlbumId,\n SCORE_NGRAMS(AlbumTitle_Tokens, @p) AS score\n FROM Albums\n WHERE SEARCH_NGRAMS(AlbumTitle_Tokens, @p)\n LIMIT 10000 # inner limit\n )\n ORDER BY score DESC\n LIMIT 10 # outer limit\n\n### PostgreSQL\n\nThis example uses query parameter `$1` which is bound to 'Hatel Kaliphorn'. \n\n SELECT albumid\n FROM\n (\n SELECT albumid, spanner.score_ngrams(albumtitle_tokens, $1) AS score\n FROM albums\n WHERE spanner.search_ngrams(albumtitle_tokens, $1)\n LIMIT 10000\n ) AS inner_query\n ORDER BY inner_query.score DESC\n LIMIT 10\n\n### N-grams-based fuzzy search versus enhanced query mode\n\nAlongside n-grams-based fuzzy search, the\n[enhanced query mode](/spanner/docs/full-text-search/query-overview#enhanced_query_mode)\nalso handles some misspelled words. Thus, there is some overlap between the two\nfeatures. The following table summarizes the differences:\n\nPerform a phonetic search with Soundex\n--------------------------------------\n\nSpanner provides the\n[`SOUNDEX`](/spanner/docs/reference/standard-sql/string_functions#soundex)\nfunction for finding words that are spelled differently, but sound the same. For\nexample, `SOUNDEX(\"steven\")`, `SOUNDEX(\"stephen\")` and`SOUNDEX(\"stefan\")` are\nall \"s315\", while `SOUNDEX(\"stella\")` is \"s340\". `SOUNDEX` is case sensitive and\nonly works for Latin-based alphabets.\n\nPhonetic search with `SOUNDEX` can be implemented with a generated column and a\nsearch index as shown in the following example: \n\n### GoogleSQL\n\n CREATE TABLE Singers (\n SingerId INT64,\n AlbumTitle STRING(MAX),\n AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN,\n Name STRING(MAX),\n NameSoundex STRING(MAX) AS (LOWER(SOUNDEX(Name))),\n NameSoundex_Tokens TOKENLIST AS (TOKEN(NameSoundex)) HIDDEN\n ) PRIMARY KEY(SingerId);\n\n CREATE SEARCH INDEX SingersPhoneticIndex ON Singers(AlbumTitle_Tokens, NameSoundex_Tokens);\n\n### PostgreSQL\n\nThis example uses\n[`spanner.soundex`](/spanner/docs/reference/postgresql/functions-and-operators#string_functions). \n\n CREATE TABLE singers (\n singerid bigint,\n albumtitle character varying,\n albumtitle_tokens spanner.tokenlist GENERATED ALWAYS AS (spanner.tokenize_fulltext(albumtitle)) VIRTUAL HIDDEN,\n name character varying,\n namesoundex character varying GENERATED ALWAYS AS (lower(spanner.soundex(name))) VIRTUAL,\n namesoundex_tokens spanner.tokenlist GENERATED ALWAYS AS (spanner.token(lower(spanner.soundex(name))) VIRTUAL HIDDEN,\n PRIMARY KEY(singerid));\n\n CREATE SEARCH INDEX singersphoneticindex ON singers(albumtitle_tokens, namesoundex_tokens);\n\nThe following query matches \"stefan\" to \"Steven\" on `SOUNDEX`, along with\n`AlbumTitle` containing \"cat\": \n\n### GoogleSQL\n\n SELECT SingerId\n FROM Singers\n WHERE NameSoundex = LOWER(SOUNDEX(\"stefan\")) AND SEARCH(AlbumTitle_Tokens, \"cat\")\n\n### PostgreSQL\n\n SELECT singerid\n FROM singers\n WHERE namesoundex = lower(spanner.soundex('stefan')) AND spanner.search(albumtitle_tokens, 'cat')\n\n\u003cbr /\u003e\n\nWhat's next\n-----------\n\n- Learn about [tokenization and Spanner tokenizers](/spanner/docs/full-text-search/tokenization).\n- Learn about [search indexes](/spanner/docs/full-text-search/search-indexes).\n- Learn about [full-text search queries](/spanner/docs/full-text-search/query-overview)."]]