Generate text embeddings

The google_ml_integration extension includes embedding functions in two different namespaces; public and google_ml. This page describes how to generate text embeddings using functions from these namespaces.

The embedding() function in the public schema can be used with any Vertex AI embedding model without registering the endpoint. If you want to pass any custom information such as the task type, register the endpoint, and then use the google_ml.embedding() function in the google_ml schema. For more information about registering an endpoint, see Register a model.

How embeddings work

Imagine a database running on AlloyDB with the following characteristics:

  • The database contains a table, items. Each row in this table describes an item that your business sells.

  • The items table contains a column, complaints. This TEXT column stores buyer complaints logged about each item.

  • The database integrates with the Vertex AI Model Garden, giving it access to the text-embedding-005 English models.

Even though this database stores complaints about items, these complaints are stored as plain text, making them difficult to query. For example, to see which items have the most complaints from customers who received the wrong color of merchandise, then you can perform ordinary SQL queries on the table, that look for various keyword matches. However, this approach only matches rows that contain those exact keywords.

For example, a basic SQL query such as SELECT * FROM item WHERE complaints LIKE "%wrong color%" doesn't return a row whose complaints field contains only The picture shows a blue one, but the one I received was red.

SQL queries using LLM-powered embeddings can help return semantically similar responses for such queries. By applying embeddings, you can query the table in this example for items whose complaints have semantic similarity to a given text prompt, such as It was the wrong color.

To generate embeddings, select one of the following schemas.

What's next