Overview of Vertex AI Vector Search

Vector Search is based on vector search technology developed by Google research. With Vector Search you can leverage the same infrastructure that provides a foundation for Google products such as Google Search, YouTube, and Play.


Vector Search can search from billions of semantically similar or semantically related items. A vector similarity-matching service has many use cases such as implementing recommendation engines, search engines, chatbots, and text classification.

Dress query

One possible use case for Vector Search is an online retailer who has an inventory of hundreds of thousands of clothing items. In this scenario, the multi-modal embedding API could help them create embeddings of these items and use Vector Search to match them to text queries to the most semantically similar images. For example, they could search for "yellow summer dress" and then Vector Search would return and display the most similar items. Vector Search can search at scale, with high queries per second (QPS), high recall, low latency, and cost efficiency.

The use of embeddings is not limited to words or text. You can generate semantic embeddings for many kinds of data, including images, audio, video, and user preferences. For generating a multimodal embedding with Vertex AI, see Get multimodal embeddings.

How to use Vector Search for semantic matching

Semantic matching can be simplified into a few steps. First, you must generate embedding representations of many items (done outside of Vector Search). Secondly, you upload your embeddings to Google Cloud, and then link your data to Vector Search. After your embeddings are added to Vector Search, you can create an index to run queries to get recommendations or results.

Generate an embedding

Generate an embedding for your dataset. This involves preprocessing the data in a way that makes it efficient to search for approximate nearest neighbors (ANN). You can do this outside of Vertex AI or you can use Generative AI on Vertex AI to create an embedding. With Generative AI on Vertex AI, you can create both text and multimodal embeddings.

Add your embedding to Cloud Storage

Upload your embedding to Cloud Storage so that you can call it from the Vector Search service.

Upload to Vector Search

Connect your embeddings to Vector Search to perform nearest neighbor search. You create an index from your embedding, which you can deploy to an index endpoint to query. The query returns the approximate nearest neighbors. To create an index, see Manage indexes. To deploy your index to an endpoint, see Deploy and manage index endpoints.

Evaluate the results

After you have the approximate nearest neighbor results, you can evaluate them to see how well they meet your needs. If the results are not accurate enough, you adjust the parameters of the algorithm or enable scaling to support more queries per second. This is done by updating your configuration file, which configures your index. To learn more, see Configure index parameters.

Vector Search terminology

This list contains some important terminology that you'll need to understand to use Vector Search:

  • Vector: A vector is a list of float values that has magnitude and direction. It can be used to represent any kind of data, such as numbers, points in space, and directions.
  • Embedding: An embedding is a type of vector that's used to represent data in a way that captures its semantic meaning. Embeddings are typically created using machine learning techniques, and they are often used in natural language processing (NLP) and other machine learning applications.
    • Dense embeddings: Dense embeddings represent the semantic meaning of text, using arrays that mostly contain non-zero values. With dense embeddings, similar search results can be returned based on semantic similarity.
    • Sparse embeddings: Sparse embeddings represent text syntax, using high-dimensional arrays that contain very few non-zero values compared to dense embeddings. Sparse embeddings are often used for keyword searches.
  • Hybrid search: Hybrid search uses both dense and sparse embeddings, which lets you search based on a combination of keyword search and semantic search. Vector Search supports search based on dense embeddings. As Public preview features, Vector Search supports sparse embeddings and hybrid search.
  • Index: A collection of vectors deployed together for similarity search. Vectors can be added to or removed from an index. Similarity search queries are issued to a specific index and search the vectors in that index.
  • Ground truth: A term that refers to verifying machine learning for accuracy against the real world, like a ground truth dataset.
  • Recall: The percentage of nearest neighbors returned by the index that are actually true nearest neighbors. For example, if a nearest neighbor query for 20 nearest neighbors returned 19 of the ground truth nearest neighbors, the recall is 19/20x100 = 95%.

  • Restrict: Functionality that limits searches to a subset of the index by using Boolean rules. Restrict is also referred to as "filtering". With Vector Search, you can use numeric filtering and text attribute filtering.

What's next