This page describes how to find approximate nearest neighbors (ANN) and query vector embeddings using the ANN distance functions.
When a dataset is small, you can use K-nearest neighbors (KNN) to find the exact k-nearest vectors. However, as your dataset grows, the latency and cost of a KNN search also increase. You can use ANN to find the approximate k-nearest neighbors with significantly reduced latency and cost.
In an ANN search, the k-returned vectors aren't the true top k-nearest neighbors because the ANN search calculates approximate distances and might not look at all the vectors in the dataset. Occasionally, a few vectors that aren't among the top k-nearest neighbors are returned. This is known as recall loss. How much recall loss is acceptable to you depends on the use case, but in most cases, losing a bit of recall in return for improved database performance is an acceptable tradeoff.
For more details about the approximate distance functions supported in Spanner, see the following GoogleSQL reference pages:
Query vector embeddings
Spanner accelerates approximate nearest neighbor (ANN) vector searches by using a vector index. You can use a vector index to query vector embeddings. To query vector embeddings, you must first create a vector index. You can then use any one of the three approximate distance functions to find the ANN.
Restrictions when using the approximate distance functions include the following:
- The approximate distance function must calculate the distance between an embedding column and a constant expression (for example, a parameter or a literal).
- The approximate distance function output must be used in a
ORDER BY
clause as the sole sort key, and aLIMIT
must be specified after theORDER BY
. - The query must explicitly filter out rows that aren't indexed. In most cases,
this means that the query must include a
WHERE <column_name> IS NOT NULL
clause that matches the vector index definition, unless the column is already marked asNOT NULL
in the table definition.
For a detailed list of limitations, see the approximate distance function reference page.
Examples
To search for the nearest 100 vectors to [1.0, 2.0, 3.0]
:
SELECT DocId
FROM Documents
WHERE WordCount > 1000
ORDER BY APPROX_EUCLIDEAN_DISTANCE(
ARRAY<FLOAT32>[1.0, 2.0, 3.0], DocEmbedding,
options => JSON '{"num_leaves_to_search": 10}')
LIMIT 100
If the embedding column is nullable:
SELECT DocId
FROM Documents
WHERE NullableDocEmbedding IS NOT NULL AND WordCount > 1000
ORDER BY APPROX_EUCLIDEAN_DISTANCE(
ARRAY<FLOAT32>[1.0, 2.0, 3.0], NullableDocEmbedding,
options => JSON '{"num_leaves_to_search": 10}')
LIMIT 100
What's next
Learn more about Spanner vector indexes.
Learn more about the GoogleSQL
APPROXIMATE_COSINE_DISTANCE()
,APPROXIMATE_EUCLIDEAN_DISTANCE()
,APPROXIMATE_DOT_PRODUCT()
functions.Learn more about the GoogleSQL
VECTOR INDEX
statements.Learn more about vector index best practices.
Try the Getting started with Spanner Vector Search for a step-by-step example of using ANN.