The most optimal values for your vector index options depend on your use case,
vector dataset, and on the query vectors. You can set and tune these values
by creating a new vector index and setting the index_option_list
in the CREATE VECTOR INDEX statement. You might need to perform iterative
tuning to find the best values for your specific workload.
Here are some helpful guidelines to follow when picking appropriate values:
tree_depth (tree level): If the table you're indexing has fewer than 10
million rows, use a tree_depth of 2. Otherwise, a tree_depth of 3
supports tables of up to about 10 billion rows.
num_leaves: Use the square root of the number of rows in the dataset. A
larger value can increase vector index build time. Avoid setting num_leaves
larger than the table_row_count divided by 1000 as this results in overly
small leaves and poor performance.
num_leaves_to_search: This option specifies how many leaf nodes of the index
are searched. Increasing num_leaves_to_search improves recall but also
increases latency and cost. We recommend using a number that is 1% the total
number of leaves defined in the CREATE VECTOR INDEX statement as the value
for num_leaves_to_search. If you're using a filter clause, increase
this value to widen the search.
If acceptable recall is achieved, but the cost of querying is too high,
resulting in low maximum QPS, try increasing num_leaves by following these
steps:
Set num_leaves to some multiple k of its original value (for example,
2 * sqrt(table_row_count)).
Set num_leaves_to_search to be the same multiple k of its original value.
Experiment with reducing num_leaves_to_search to improve cost and QPS
while maintaining recall.
Improve recall
To improve recall, consider tuning the num_leaves_to_search value or
rebuilding your vector index.
Increase the num_leaves_to_search value
If the num_leaves_to_search value is too small, you might find it more
challenging to find the nearest neighbors for some query vectors. Creating a new
vector index with an increased num_leaves_to_search value can help improve
recall by searching more leaves. Recent queries might contain more of these
challenging vectors.
Rebuild the vector index
The tree structure of the vector index is optimized for the dataset at the time
of creation, and is static thereafter. Therefore, if significantly different
vectors are added after creating the initial vector index, then the tree
structure might be sub-optimal, leading to poorer recall.
To rebuild your vector index without downtime:
Create a new vector index on the same embedding column as the current vector
index, updating parameters (for example, OPTIONS) as appropriate.
After the index creation completes, use the FORCE_INDEX hint
to point at the new index to update the vector search query. This ensures
that the query uses the new vector index. You might also need to retune
num_leaves_to_search in your new query.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Vector indexing best practices\n\n| **PostgreSQL interface note:** The examples in this topic are intended for GoogleSQL-dialect databases. This feature doesn't support PostgreSQL interface.\n\n\u003cbr /\u003e\n\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page describes vector indexing best practices that optimize your\n[vector indexes](/spanner/docs/vector-indexes) and improve\n[approximate nearest neighbor (ANN) query results](/spanner/docs/find-approximate-nearest-neighbors#query-vector-embeddings).\n\nTune the vector search options\n------------------------------\n\nThe most optimal values for your vector index options depend on your use case,\nvector dataset, and on the query vectors. You can set and tune these values\nby creating a new vector index and setting the [`index_option_list`](/spanner/docs/reference/standard-sql/data-definition-language#index_option_list)\nin the `CREATE VECTOR INDEX` statement. You might need to perform iterative\ntuning to find the best values for your specific workload.\n\nHere are some helpful guidelines to follow when picking appropriate values:\n\n- `tree_depth` (tree level): If the table you're indexing has fewer than 10\n million rows, use a `tree_depth` of `2`. Otherwise, a `tree_depth` of `3`\n supports tables of up to about 10 billion rows.\n\n- `num_leaves`: Use the square root of the number of rows in the dataset. A\n larger value can increase vector index build time. Avoid setting `num_leaves`\n larger than the `table_row_count` divided by 1000 as this results in overly\n small leaves and poor performance.\n\n- `num_leaves_to_search`: This option specifies how many leaf nodes of the index\n are searched. Increasing `num_leaves_to_search` improves recall but also\n increases latency and cost. We recommend using a number that is 1% the total\n number of leaves defined in the `CREATE VECTOR INDEX` statement as the value\n for `num_leaves_to_search`. If you're using a filter clause, increase\n this value to widen the search.\n\nIf acceptable recall is achieved, but the cost of querying is too high,\nresulting in low maximum QPS, try increasing `num_leaves` by following these\nsteps:\n\n1. Set `num_leaves` to some multiple k of its original value (for example, `2 * sqrt(table_row_count)`).\n2. Set `num_leaves_to_search` to be the same multiple k of its original value.\n3. Experiment with reducing `num_leaves_to_search` to improve cost and QPS while maintaining recall.\n\nImprove recall\n--------------\n\nTo improve recall, consider tuning the `num_leaves_to_search` value or\nrebuilding your vector index.\n\n### Increase the `num_leaves_to_search` value\n\nIf the `num_leaves_to_search` value is too small, you might find it more\nchallenging to find the nearest neighbors for some query vectors. Creating a new\nvector index with an increased `num_leaves_to_search` value can help improve\nrecall by searching more leaves. Recent queries might contain more of these\nchallenging vectors.\n\n### Rebuild the vector index\n\nThe tree structure of the vector index is optimized for the dataset at the time\nof creation, and is static thereafter. Therefore, if significantly different\nvectors are added after creating the initial vector index, then the tree\nstructure might be sub-optimal, leading to poorer recall.\n\nTo rebuild your vector index without downtime:\n\n1. Create a new vector index on the same embedding column as the current vector index, updating parameters (for example, `OPTIONS`) as appropriate.\n2. After the index creation completes, use the [`FORCE_INDEX` hint](/spanner/docs/secondary-indexes#index-directive) to point at the new index to update the vector search query. This ensures that the query uses the new vector index. You might also need to retune `num_leaves_to_search` in your new query.\n3. Drop the outdated vector index.\n\nWhat's next\n-----------\n\n- Learn more about Spanner [vector indexes](/spanner/docs/vector-indexes).\n\n- Learn more about Spanner [approximate nearest neighbors](/spanner/docs/find-approximate-nearest-neighbors).\n\n- Learn more about the [GoogleSQL `APPROXIMATE_COSINE_DISTANCE()`, `APPROXIMATE_EUCLIDEAN_DISTANCE()`, `APPROXIMATE_DOT_PRODUCT()`](/spanner/docs/reference/standard-sql/mathematical_functions) functions.\n\n- Learn more about the [GoogleSQL `VECTOR INDEX` statements](/spanner/docs/reference/standard-sql/data-definition-language#vector_index_statements)."]]