Maintain vector indexes

This page describes options for maintaining your vector indexes. Maintaining indexes helps to ensure that the indexes adapt to data changes that might impact the accuracy of your search results. Use the strategies in this page to avoid degradation in query performance as your dataset grows.

View vector index metrics

If your table is prone to frequent updates or insertions, then we recommend periodically reindexing the existing ScaNN index in order to improve the recall accuracy for your index. You can monitor index metrics to view changes in vector distributions or vector mutations since the index was built, and then reindex accordingly.

For more information about metrics, see View vector index metrics.

Maintain indexes automatically

You can use the scann.enable_preview_features database flag (GUC) along with the index-level auto_maintenance parameter while creating a ScaNN index. Using these settings together lets AlloyDB incrementally manage the index such that when your dataset grows, it splits large outlier partitions. By splitting partitions, AlloyDB tries to provide better QPS and search results.

Any updates made to the index as a result of auto maintenance are permanent until AlloyDB updates the index again.

To enable AlloyDB to maintain an index automatically, enable the scann.enable_preview_features flag:

gcloud alloydb instances update INSTANCE_ID \
     --database-flags scann.enable_preview_features=on \
     --region=REGION_ID \
     --cluster=CLUSTER_ID \
     --project=PROJECT_ID

Replace the following:

  • INSTANCE_ID: The ID of the instance.
  • REGION_ID: The region where the instance is placed—for example, us-central1.
  • CLUSTER_ID: The ID of the cluster where the instance is placed.
  • PROJECT_ID: The ID of the project where the cluster is placed.

After you enable the scann.enable_preview_features flag, you can enable auto maintenance for indexes, or you can enable the scann_index_maintenance function to manually invoke maintenance.

Enable auto-maintenance during index creation

To create a ScaNN index with auto maintenance enabled, run the following example command:

CREATE INDEX INDEX_NAME ON TABLE
  USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
  WITH (num_leaves=NUM_LEAVES_VALUE, auto_maintenance=on);

Replace the following:

  • INDEX_NAME: the name of the index you want to create—for example, my-scann-index. The index names are shared across your database. Ensure that each index name is unique to each table in your database.

  • TABLE: the table to add the index to.

  • EMBEDDING_COLUMN: a column that stores vector data.

  • DISTANCE_FUNCTION: the distance function to use with this index. Choose one of the following:

    • L2 distance: l2

    • Dot product: dot_product

    • Cosine distance: cosine

  • NUM_LEAVES_VALUE: the number of partitions to apply to this index. Set to any value between 1 to 1048576. For more information about how to decide this value, see Tune a ScaNN index.

Manage leaves to search for split partitions automatically

If you have enabled automatic maintenance of indexes, then AlloyDB automatically splits partitions when the num_leaves threshold is reached. As the number of partitions grows due to these splits, you should adjust the number of leaves to search to maintain optimal performance.

To manage the number of leaves to search automatically, use pct_leaves_to_search. This parameter lets you specify a percentage of number of partitions to search. If you expect your dataset to grow significantly, then start by setting the pct_leaves_to_search value to 1. The parameter is disabled by default.

Set this value to the percentage of current number of partitions. For example, to search 1% of current number of partitions, set this value to 1.

You can set this parameter to any value between 0 to 100. The default value is 0, which disables this parameter and uses the scann.num_leaves_to_search to calculate the number of leaves to search.

To set the pct_leaves_to_search flag on your database, run the following command:

ALTER DATABASE DATABASE_NAME SET scann.pct_leaves_to_search = PERCENTAGE_LEAVES_TO_SEARCH;

Replace the following:

  • DATABASE_NAME: the name of the database.
  • PERCENTAGE_LEAVES_TO_SEARCH: the percentage of num_leaves to search.

Manually invoke maintenance

If you want to invoke maintenance on a particular index on-demand, then run the following command:

SELECT scann_index_maintenance('INDEX_NAME');

Manually rebuild your index

You can manually rebuild your index if you want to rebuild it with the configurations you specified when it was created.

To manually rebuild your index, run the following command:

REINDEX INDEX CONCURRENTLY INDEX_NAME;

Replace INDEX_NAME with the name of the index you want to rebuild—for example, my-scann-index. The index names are shared across your database. Ensure that each index name is unique to each table in your database.

For more information about reindexing in PostgreSQL, see REINDEX.

What's next