This page describes options for maintaining your vector indexes. Maintaining indexes helps to ensure that the indexes adapt to data changes that might impact the accuracy of your search results. Use the strategies on this page to avoid degradation in query performance as your dataset grows.
View vector index metrics
If your table is prone to frequent updates or insertions, then we recommend periodically reindexing the existing ScaNN index in order to improve the recall accuracy for your index. You can monitor index metrics to view changes in vector distributions or vector mutations since the index was built, and then reindex accordingly.
For more information about metrics, see View vector index metrics.
Maintain indexes automatically
You can use the scann.enable_index_maintenance
Grand Unified Configuration (GUC) database flag along with the
index-level auto_maintenance
parameter while creating a ScaNN index. Using these settings together lets
AlloyDB incrementally manage the index and split large outlier partitions as your dataset grows. By
splitting partitions, AlloyDB tries to provide better Queries per Second (QPS) and
search results.
Auto-maintenance index updates persist until AlloyDB updates the index again.
To enable AlloyDB to maintain an index automatically, add the alloydb_scann
extension into the shared_preload_libraries
parameter, enable the scann.enable_index_maintenance
database flag, and then load the alloydb_scann
extension by restarting the database:
Load the
alloydb_scann
extension to theshared_preload_libraries
list:sudo sed -r -i "s|(shared_preload_libraries\s*=\s*)'(.*)'.*$|\1'\2,alloydb_scann'|" DATA_DIR/postgresql.conf
Verify that the parameter's configuration is set properly:
grep -iE 'shared_preload_libraries' DATA_DIR/postgresql.conf
Enable the
scann.enable_index_maintenance
flag by setting the flag in the in thepostgresql.conf
file:scann.enable_index_maintenance = ON
Restart AlloyDB Omni for the parameter change to take effect:
Docker
docker container restart CONTAINER_NAME
Replace
CONTAINER_NAME
with the name that you assigned to the AlloyDB Omni container when you started it.Podman
podman container restart CONTAINER_NAME
Replace
CONTAINER_NAME
with the name that you assigned to the AlloyDB Omni container when you started it.
After you enable the scann.enable_index_maintenance
flag, you can enable auto maintenance for indexes, or you can enable the scann_index_maintenance
function to manually invoke maintenance.
Enable auto-maintenance during index creation
To create a ScaNN index with auto maintenance enabled, run the following example command:
CREATE INDEX INDEX_NAME ON TABLE
USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
WITH (num_leaves=NUM_LEAVES_VALUE, auto_maintenance=on);
Replace the following:
INDEX_NAME
: the name of the index you want to create—for example,my-scann-index
. The index names are shared across your database. Ensure that each index name is unique to each table in your database.TABLE
: the table to add the index to.EMBEDDING_COLUMN
: a column that storesvector
data.DISTANCE_FUNCTION
: the distance function to use with this index. Choose one of the following:L2 distance:
l2
Dot product:
dot_product
Cosine distance:
cosine
NUM_LEAVES_VALUE
: the number of partitions to apply to this index. Set to any value between 1 to 1048576. For more information about how to decide this value, see Tune aScaNN
index.
Manage leaves to search for split partitions automatically
If you have enabled automatic maintenance of indexes, then AlloyDB
automatically splits partitions when the num_leaves
threshold is reached. As
the number of partitions grows due to these splits, you should adjust the number
of leaves to search to maintain optimal performance.
To manage the number of leaves to search automatically, use
pct_leaves_to_search
. This parameter lets you specify a percentage of number
of partitions to search. If you expect your dataset to grow significantly, then
start by setting the pct_leaves_to_search
value to 1. The parameter is
disabled by default.
Set this value to the percentage of current number of partitions. For example,
to search 1% of current number of partitions, set this value to 1
.
You
can set this parameter to any value between 0
to 100
.
The default value is 0
, which disables this parameter and uses the
scann.num_leaves_to_search
to calculate the number of leaves to search.
To set the pct_leaves_to_search
flag on your database, run the following command:
ALTER DATABASE DATABASE_NAME SET scann.pct_leaves_to_search = PERCENTAGE_LEAVES_TO_SEARCH;
Replace the following:
DATABASE_NAME
: the name of the database.PERCENTAGE_LEAVES_TO_SEARCH
: the percentage ofnum_leaves
to search.
Manually invoke maintenance
If you want to invoke maintenance on a particular index on-demand, then run the following command:
SELECT scann_index_maintenance('INDEX_NAME');
Manually rebuild your index
You can manually rebuild your index if you want to rebuild it with the configurations you specified when it was created.
To manually rebuild your index, run the following command:
REINDEX INDEX CONCURRENTLY INDEX_NAME;
Replace INDEX_NAME
with the name of the index you want to
rebuild—for example, my-scann-index
. The index names are shared
across your database. Ensure that each index name is unique to each
table in your database.
For more information about reindexing in PostgreSQL, see REINDEX.