This document describes how to perform hybrid searches in AlloyDB for PostgreSQL, which combines text search, keyword matching, and vector search, semantic similarity, using the pgvector
extension. Hybrid search lets you retrieve highly relevant results by leveraging both exact keyword matches and semantically similar content.
Run a similarity search with text and vector input
To perform a hybrid search in AlloyDB for PostgreSQL, you'll need to create both a vector index and a text search index on your table. Then, you'll combine the results from both searches and re-rank them to present the most relevant information.
Create a GIN index
A Generalized Inverted Index (GIN) index is a specialized index type optimized for searching within composite values, such as arrays, JSONB, and full-text search data.
To create a GIN index on your text data to perform a full text search, run the following:
CREATE INDEX INDEX_NAME ON TABLE USING GIN (to_tsvector('english', COLUMN_NAME))
Replace the following:
INDEX_NAME
: the name of the index you want to create —for example,my-gin-index
.TABLE
: the table to add the index to.COLUMN_NAME
: the column that stores the text data you want to search.
Create a ScaNN index
To apply a two-level tree index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:
CREATE INDEX INDEX_NAME ON TABLE
USING scann (EMBEDDING_COLUMN DISTANCE_FUNCTION)
WITH (num_leaves=NUM_LEAVES_VALUE);
Replace the following:
INDEX_NAME
: the name of the index you want to create—for example,my-scann-index
. The index names are shared across your database. Ensure that each index name is unique to each table in your database.TABLE
: the table to add the index to.EMBEDDING_COLUMN
: a column that storesvector
data.DISTANCE_FUNCTION
: the distance function to use with this index. Choose one of the following:L2 distance:
l2
Dot product:
dot_product
Cosine distance:
cosine
NUM_LEAVES_VALUE
: the number of partitions to apply to this index. Set to any value between 1 to 1048576. For more information about how to decide this value, see Tune aScaNN
index.
To learn more about different ScaNN index configurations, see Create a ScaNN
index. You can also create a HNSW index.
Perform a hybrid search using reciprocal rank fusion
Hybrid search involves performing separate vector and text searches, then combining and re-ranking results using Reciprocal Rank Fusion, RRF. RRF is a rank-based algorithm that combines multiple ranked lists of search results into a single ranked list by assigning a score to each document based on its reciprocal rank across all contributing lists, with higher-ranked documents receiving a greater contribution. Use the following SQL Query to combine full text search and hybrid search and re-rank the results:
WITH vector_search AS (
SELECT id,
RANK () OVER (ORDER BY embedding <=> google_ml.embedding('MODEL_IDVERSION_TAG', 'TEXT')) AS rank
FROM <var>TABLE</var>
ORDER BY embedding <=> google_ml.embedding('MODEL_IDVERSION_TAG', 'TEXT') LIMIT 10
),
text_search AS (
SELECT id,
RANK () OVER (ORDER BY ts_rank(to_tsvector('english', COLUMN_NAME), to_tsquery(KEYWORD)) desc)
FROM <var>TABLE</var>
WHERE to_tsvector('english', COLUMN_NAME) @@ to_tsquery(KEYWORD)
ORDER BY ts_rank(to_tsvector('english', COLUMN_NAME), to_tsquery(KEYWORD)) desc
LIMIT 10
)
SELECT
COALESCE(vector_search.id, text_search.id) AS id,
COALESCE(1.0 / (60 + vector_search.rank), 0.0) + COALESCE(1.0 / (60 + text_search.rank), 0.0) AS rrf_score
FROM vector_search FULL OUTER JOIN text_search ON vector_search.id = text_search.id
ORDER BY rrf_score DESC
LIMIT 5;
Replace the following:
TABLE
: the table containing your data.MODEL_ID
: the ID of the model to query.If you are using the Vertex AI Model Garden, then specify
text-embedding-005
as the model ID. These are the cloud-based models that AlloyDB can use for text embeddings. For more information, see Text embeddings.Optional:
VERSION_TAG
: the version tag of the model to query. Prepend the tag with@
.If you are using one of the
text-embedding-005
English models with Vertex AI, then specify one of the version tags—for example,text-embedding-005
, listed in Model versions.Google strongly recommends that you always specify the version tag. If you don't specify the version tag, then AlloyDB uses the latest model version, which might lead to unexpected results.
TEXT
: the text to translate into a vector embedding.KEYWORD
: the keyword you want to search for.COLUMN_NAME
: a column that stores contains the text data you want to search.
Explanation of the Hybrid Search Query:
vector_search CTE
: Performs a standard vector similarity search, ordering results by cosine distance and assigning a rank. It retrieves the top 10 most semantically similar products.text_search CTE
: Executes a text search usingto_tsvector
andto_tsquery
, calculating relevance withts_rank
and retrieving the top 10 most relevant text matches.Final SELECT Statement
: Joins vector and text search results using aFULL OUTER JOIN
, selects the product ID, calculates the RRF score, orders by score, and retrieves the top 5 results.