Text embeddings overview

Embeddings are numerical representations of text, images, or videos that capture relationships between inputs. Machine learning models, especially Generative AI models, are suited for creating embeddings by identifying patterns within large datasets. Applications can use embeddings to process and produce language, recognizing complex meanings and semantic relationships specific to your content.

Vertex AI on Google Distributed Cloud (GDC) air-gapped supports Text Embedding APIs for English and multilingual textual input. Text Embedding works by converting text into arrays of floating-point numbers called vectors. These vectors are designed to capture the meaning of the text. The length of the embedding array is called the vector's dimensionality. For example, one passage of text might be represented by a vector containing hundreds of dimensions. Then, by calculating the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the objects.

For a list of supported models, see Embeddings models.

For a list of supported multilingual languages, see Supported text embedding languages.

Text embeddings use cases

Some common use cases for text embeddings include:

  • Semantic search: Search text ranked by semantic similarity.
  • Classification: Return the class of items with attributes similar to the given text.
  • Clustering: Cluster items whose text attributes are similar to the given text.
  • Outlier detection: Return items where text attributes are least related to the given text.
  • Conversational interface: Cluster groups of sentences that can lead to similar responses, like in a conversation-level embedding space.

Example use case: Develop a book recommendation chatbot

If you want to develop a book recommendation chatbot, the first thing to do is to use a deep neural network (DNN) to convert each book into an embedding vector, where one embedding vector represents one book. You can feed the book title or text content as input to the DNN. Alternatively, you can use both of these inputs together, along with any other metadata describing the book, such as the genre.

The embeddings in this example could include thousands of book titles with summaries and their genre. It might have representations for books like Wuthering Heights by Emily Brontë and Persuasion by Jane Austen that are similar to each other (small distance between numerical representation). In contrast, the numerical representation of the book The Great Gatsby by F. Scott Fitzgerald would be further, as the time period, genre, and summary are less similar.

The inputs are the main influence on the orientation of the embedding space. For example, if we only had book title inputs, then two books with similar titles but very different summaries could be close together. However, if we include the title and summary, then these same books are less similar (further away) in the embedding space.

Working with Generative AI, this book-suggestion chatbot could summarize, suggest, and show you books that you might like (or dislike) based on your query.

What's next