This page describes how to manually manage the content of the columnar engine to accelerate query performance. It explains how to add and remove tables, materialized views, and ScaNN indexes.
Manage tables, materialized views, and ScaNN indexes
Based on your workload, you can manually add columns to the column store or drop some or all of the columns of a table or a materialized view from the column store. Query evaluation automatically uses the stored columnar data to answer queries.
Similarly, you can manually add a ScaNN index to the columnar engine to speed up vector search. This allows for efficient nearest neighbor searches on large datasets of vector embeddings.
When choosing which tables, materialized views and columns to add to the column store, consider both the size of the column store and the shape of the workload. Good candidates for selection include large tables or views that are frequently scanned. Within these tables or views, identify any large non-unique indexes used by the OLAP workload. You can add these indexes' columns to the column store and potentially drop the indexes, thereby eliminating the performance cost associated with their maintenance on the primary instance.
Use one of the following methods to manage the content of the columnar engine:
- Manage column store content by updating database flags
- Manage column store content using SQL functions
See What data you can add to the column store for information about what data types and data sources you can use when adding tables, columns, and materialized views to the column store.
Before you begin
- You must have one of these IAM roles in the Google Cloud project you are using:
roles/alloydb.admin
(the AlloyDB Admin predefined IAM role)roles/owner
(the Owner basic IAM role)roles/editor
(the Editor basic IAM role)
If you don't have any of these roles, contact your Organization Administrator to request access.
Manage column store content by updating database flags
You can manage column store content manually by updating the
google_columnar_engine.relations
database flag. The flag has a single
value that specifies all of your column store's data sources. During the
restart, columns specified in this flag are automatically populated into the
column store.
You can use this flag together with auto-columnarization. If the
columnar engine has memory available after populating the columns you
specify through the google_columnar_engine.relations
flag, then
auto-columnarization adds further columns to the column store, as
needed.
For more information on setting an instance's database flags using either Google Cloud CLI or the Google Cloud console, see Configure an instance's database flags.
Add columns using flags
To add columns to the column store, define an instance's
google_columnar_engine.relations
flag. Set its value to a comma-separated list
of items, with each item specifying a list of columns to include from a specific
table, in this format:
DATABASE_NAME.SCHEMA_NAME.TABLE_NAME(COLUMN_LIST)
Replace the following:
DATABASE_NAME: the database containing the columns to add to the column store.
SCHEMA_NAME: the schema identifying the tables or the materialized views to add to the column store—for example,
public
.TABLE_NAME: the table or the materialized view containing the columns to add to the column store.
COLUMN_LIST: a comma-separated list of the columns to add to the column store.
To add all of one table's or one materialized view's columns, omit the column list:
DATABASE_NAME.SCHEMA_NAME.TABLE_NAME
For more information on setting the database flags on an instance, see Configure an instance's database flags.
Google Cloud CLI CLI syntax when using google_columnar_engine.relations
If you use the Google Cloud CLI to set the google_columnar_engine.relations
flag, then you must use alternate delimiter syntax to separate the database flags that you set. This is because the column list is comma-separated, requiring you to use a non-comma character when specifying flags within the same command-line argument.
The following example demonstrates using the gcloud CLI to add two columns in the public
schema to the column store. The example sets a separate
flag to enable the columnar engine, because using the
gcloud alloydb instance update
command to set any database flag must also set all other database flags that you want to have non-default values, even if they're already set.
gcloud alloydb instances update INSTANCE_NAME \
--database-flags=^:^\
google_columnar_engine.relations='DATABASE_NAME.public.TABLE_NAME(COLUMN_1,COLUMN_2)'\
:google_columnar_engine.enabled=on\
[:FLAG_3=VALUE_3 ...] \
--region=REGION_ID \
--cluster=CLUSTER_ID
Drop columns using flags
To drop columns from the column store, set a new value for the
google_columnar_engine.relations
flag described in the previous section,
leaving out the columns that you want to drop.
To drop all of the columns from the column store, unset the
google_columnar_engine.relations
flag from your instance.
For more information on setting database flags on an instance, see Configure an instance's database flags.
Manage column store content using SQL functions
You can manage column store content manually by using SQL functions.
Add columns using SQL functions
Run the google_columnar_engine_add
SQL function to add columns
to the column store.
This method adds the specified columns to the column store and manages the columns in the connected node only. The new columns don't persist within the store across instance restarts.
This method doesn't change the
google_columnar_engine.relations
database flag. Auto
columnarization
does not consider columns added by this SQL function.
psql Client
SELECT google_columnar_engine_add( relation => 'TABLE_NAME', columns => 'COLUMN_LIST' );
-
Replace the following:
- 'TABLE_NAME': A string containing the name of the table or the
materialized view. If the table or the view is in a schema other than
public
, specify the name of the schema, a dot, and the name of the table or the view; for example,'myschema.mytable'
. - 'COLUMN_LIST': A string containing a comma-separated, case-sensitive list of the names of the columns you want to add. If you want to add all of the table's or the materialized view's columns to the column store, omit this parameter.
Drop columns using SQL functions
Run the google_columnar_engine_drop
SQL function as follows:
psql Client
SELECT google_columnar_engine_drop( relation => 'TABLE_NAME', columns => 'COLUMN_LIST' );
- 'TABLE_NAME': A string containing the name of the table or the
materialized view. If the table or the view is in a schema other than
public
, specify the name of the schema, a dot, and the name of the table or the view; for example,'myschema.mytable'
. - 'COLUMN_LIST': A string containing a comma-separated, case-sensitive list of the names of the columns you want to add. If you want to add all of the table's or the materialized view's columns to the column store, omit this parameter.
You can use the google_columnar_engine_drop
function to remove columns added
by editing the google_columnar_engine.relations
database flag. However, when
you do so, those columns are added back to the columnar store when the
instance restarts.
Manage ScaNN indexes in the columnar engine
To accelerate vector search, you can load ScaNN indexes to the columnar engine's index cache.
Add, remove, and verify ScaNN indexes using SQL functions
You can manually add, remove, and verify ScaNN indexes from the index cache using SQL functions. Adding an index to the cache can accelerate vector search queries.
Add a ScaNN index
To add a specific index to the columnar engine's index cache, use the following SQL function:
SELECT google_columnar_engine_add_index(index => 'SCANN_INDEX_NAME');
Replace SCANN_INDEX_NAME
with the name of the ScaNN index that you want to add to the columnar engine's index cache.
Remove a ScaNN index
To drop a specific ScaNN index from the columnar engine, use the following SQL function:
SELECT google_columnar_engine_drop_index(index => 'SCANN_INDEX_NAME');
Replace SCANN_INDEX_NAME
with the name of the ScaNN index that you want to remove.
Verify columnar engine usage
To verify if your vector index scans use the columnar engine, execute the EXPLAIN ANALYZE
command with the COLUMNAR_ENGINE
option:
EXPLAIN (ANALYZE TRUE, SCANN TRUE, COSTS FALSE, TIMING FALSE, SUMMARY FALSE, VERBOSE FALSE, COLUMNAR_ENGINE TRUE)
SELECT * FROM t ORDER BY val <=> '[0.5,0.5,0.5,0.5]' LIMIT 100;
The command returns a query plan similar to the following:
Index Scan using t_ix3 on t t_1 (actual rows=100 loops=1)
Order By: (val <=> '[0.5,0.5,0.5,0.5]'::vector)
Limit: 100
ScaNN Info: (... columnar engine nodes hit=6...)
Columnar Engine ScaNN Info: (index found=true)
The presence of columnar engine nodes hit
and Columnar Engine ScaNN Info: (index found=true)
in the output confirms that the columnar engine is used for the query.