如果您使用的 WHERE 子句过滤条件不是表主键的一部分,则可以创建二级索引,以通过仅限索引的扫描来加快操作速度。
GoogleSQL
CREATEINDEXDocsByAuthorONDocuments(Author)STORING(DocEmbedding);SELECTAuthor,DocId,DocEmbeddingFROMDocumentsWHEREAuthor="Mark Twain"ORDERBYEUCLIDEAN_DISTANCE(DocEmbedding,<embeddingsfor"book about the time traveling American">)LIMIT5;
PostgreSQL
CREATEINDEXDocsByAuthorONDocuments(Author)INCLUDE(DocEmbedding);SELECTAuthor,DocId,DocEmbeddingFROMDocumentsWHEREAuthor="Mark Twain"ORDERBYspanner.euclidean_distance(DocEmbedding,<embeddingsfor"that book about the time traveling American">)LIMIT5;
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[],[],null,["# Perform vector similarity search in Spanner by finding the K-nearest neighbors\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page describes how to perform a vector similarity search in\nSpanner by using the cosine distance, Euclidean distance, and dot\nproduct vector functions to find K-nearest neighbors. This information\napplies to both GoogleSQL-dialect databases and PostgreSQL-dialect databases. Before you read this page,\nit's important that you understand the following concepts:\n\n- [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance): measures the shortest distance between two vectors.\n- [Cosine distance](https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance): measures the cosine of the angle between two vectors.\n- [Dot product](https://mathworld.wolfram.com/DotProduct.html): calculates the cosine of the angle multiplied by the product of corresponding vector magnitudes. If you know that all the vector embeddings in your dataset are normalized, then you can use `DOT_PRODUCT()` as a distance function.\n- [K-nearest neighbors (KNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm): a supervised machine learning algorithm used to solve classification or regression problems.\n\nYou can use vector distance functions to perform K-nearest neighbors (KNN)\nvector search for use cases like similarity search or retrieval-augmented\ngeneration. Spanner supports the `COSINE_DISTANCE()`,\n`EUCLIDEAN_DISTANCE()`, and `DOT_PRODUCT()` functions, which operate on vector\nembeddings, allowing you to find the KNN of the input embedding.\n\nFor example, after you [generate and save your operational Spanner\ndata as vector embeddings](/spanner/docs/ml-tutorial-embeddings), you can then\nprovide these vector embeddings as an input parameter in your query to find the\nnearest vectors in N-dimensional space to search for semantically similar or\nrelated items.\n\nAll three distance functions take the arguments `vector1` and `vector2`, which\nare of the type `array\u003c\u003e`, and must consist of the same dimensions and have the\nsame length. For more details about these functions, see:\n\n- [`COSINE_DISTANCE()` in GoogleSQL](/spanner/docs/reference/standard-sql/mathematical_functions#cosine_distance)\n- [`EUCLIDEAN_DISTANCE()` in GoogleSQL](/spanner/docs/reference/standard-sql/mathematical_functions#euclidean_distance)\n- [`DOT_PRODUCT()` in GoogleSQL](/spanner/docs/reference/standard-sql/mathematical_functions#dot_product)\n- [Mathematical functions in PostgreSQL](/spanner/docs/reference/postgresql/functions-and-operators#mathematical) (`spanner.cosine_distance()`, `spanner.euclidean_distance()`, and `spanner.dot_product()`)\n- [Choose among vector distance functions to measure vector embeddings similarity](/spanner/docs/choose-vector-distance-function).\n\nExamples\n--------\n\nThe following examples show KNN search, KNN search over partitioned data, and\nusing a secondary index with KNN.\n\nThe examples all use `EUCLIDEAN_DISTANCE()`. You can also use\n`COSINE_DISTANCE()`. In addition, if all the vector embeddings in your dataset\nare normalized, you can use `DOT_PRODUCT()` as a distance function.\n\n### Example 1: KNN search\n\nConsider a `Documents` table that has a column (`DocEmbedding`) of precomputed\ntext embeddings from the `DocContents` bytes column. \n\n### GoogleSQL\n\n CREATE TABLE Documents (\n UserId INT64 NOT NULL,\n DocId INT64 NOT NULL,\n Author STRING(1024),\n DocContents BYTES(MAX),\n DocEmbedding ARRAY\u003cFLOAT32\u003e\n ) PRIMARY KEY (UserId, DocId);\n\n### PostgreSQL\n\n CREATE TABLE Documents (\n UserId bigint NOT NULL,\n DocId bigint NOT NULL,\n Author varchar(1024),\n DocContents bytea,\n DocEmbedding float4[],\n PRIMARY KEY (UserId, DocId)\n );\n\nAssuming that an input embedding for \"baseball, but not professional baseball\"\nis the array `[0.3, 0.3, 0.7, 0.7]`, you can find the top five nearest documents\nthat match, with the following query: \n\n### GoogleSQL\n\n SELECT DocId, DocEmbedding FROM Documents\n ORDER BY EUCLIDEAN_DISTANCE(DocEmbedding,\n ARRAY\u003cFLOAT32\u003e[0.3, 0.3, 0.7, 0.8])\n LIMIT 5;\n\n### PostgreSQL\n\n SELECT DocId, DocEmbedding FROM Documents\n ORDER BY spanner.euclidean_distance(DocEmbedding,\n '{0.3, 0.3, 0.7, 0.8}'::float4[])\n LIMIT 5;\n\nThe expected results of this example: \n\n Documents\n +---------------------------+-----------------+\n | DocId | DocEmbedding |\n +---------------------------+-----------------+\n | 24 | [8, ...] |\n +---------------------------+-----------------+\n | 25 | [6, ...] |\n +---------------------------+-----------------+\n | 26 | [3.2, ...] |\n +---------------------------+-----------------+\n | 27 | [38, ...] |\n +---------------------------+-----------------+\n | 14229 | [1.6, ...] |\n +---------------------------+-----------------+\n\n### Example 2: KNN search over partitioned data\n\nThe query in the previous example can be modified by adding conditions to the\n`WHERE` clause to limit the vector search to a subset of your data. One common\napplication of this is to search over partitioned data, such as rows that belong\nto a specific `UserId`. \n\n### GoogleSQL\n\n SELECT UserId, DocId, DocEmbedding FROM Documents\n WHERE UserId=18\n ORDER BY EUCLIDEAN_DISTANCE(DocEmbedding,\n ARRAY\u003cFLOAT32\u003e[0.3, 0.3, 0.7, 0.8])\n LIMIT 5;\n\n### PostgreSQL\n\n SELECT UserId, DocId, DocEmbedding FROM Documents\n WHERE UserId=18\n ORDER BY spanner.euclidean_distance(DocEmbedding,\n '{0.3, 0.3, 0.7, 0.8}'::float4[])\n LIMIT 5;\n\nThe expected results of this example: \n\n Documents\n +-----------+-----------------+-----------------+\n | UserId | DocId | DocEmbedding |\n +-----------+-----------------+-----------------+\n | 18 | 234 | [12, ...] |\n +-----------+-----------------+-----------------+\n | 18 | 12 | [1.6, ...] |\n +-----------+-----------------+-----------------+\n | 18 | 321 | [22, ...] |\n +-----------+-----------------+-----------------+\n | 18 | 432 | [3, ...] |\n +-----------+-----------------+-----------------+\n\n### Example 3: KNN search over secondary index ranges\n\nIf the `WHERE` clause filter you're using isn't part of the table's primary key,\nthen you can create a secondary index to accelerate the operation with an\n[index-only scan](/spanner/docs/secondary-indexes#storing-clause). \n\n### GoogleSQL\n\n CREATE INDEX DocsByAuthor\n ON Documents(Author)\n STORING (DocEmbedding);\n\n SELECT Author, DocId, DocEmbedding FROM Documents\n WHERE Author=\"Mark Twain\"\n ORDER BY EUCLIDEAN_DISTANCE(DocEmbedding,\n \u003cembeddings for \"book about the time traveling American\"\u003e)\n LIMIT 5;\n\n### PostgreSQL\n\n CREATE INDEX DocsByAuthor\n ON Documents(Author)\n INCLUDE (DocEmbedding);\n\n SELECT Author, DocId, DocEmbedding FROM Documents\n WHERE Author=\"Mark Twain\"\n ORDER BY spanner.euclidean_distance(DocEmbedding,\n \u003cembeddings for \"that book about the time traveling American\"\u003e)\n LIMIT 5;\n\nThe expected results of this example: \n\n Documents\n +------------+-----------------+-----------------+\n | Author | DocId | DocEmbedding |\n +------------+-----------------+-----------------+\n | Mark Twain | 234 | [12, ...] |\n +------------+-----------------+-----------------+\n | Mark Twain | 12 | [1.6, ...] |\n +------------+-----------------+-----------------+\n | Mark Twain | 321 | [22, ...] |\n +------------+-----------------+-----------------+\n | Mark Twain | 432 | [3, ...] |\n +------------+-----------------+-----------------+\n | Mark Twain | 375 | [9, ...] |\n +------------+-----------------+-----------------+\n\nWhat's next\n-----------\n\n- Learn more about the [GoogleSQL `COSINE_DISTANCE()`, `EUCLIDEAN_DISTANCE()`, `DOT_PRODUCT()`](/spanner/docs/reference/standard-sql/mathematical_functions) functions.\n\n- Learn more about the [PostgreSQL `spanner.cosine_distance()`, `spanner.euclidean_distance()`, `spanner.dot_product()`](/spanner/docs/reference/postgresql/functions-and-operators#mathematical) functions.\n\n- Learn more about how to [Choose among vector distance functions to measure vector embeddings similarity](/spanner/docs/choose-vector-distance-function)."]]