[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-25。"],[],[],null,["# Embeddings APIs overview\n\nEmbeddings are numerical representations of text, images, or videos that capture\nrelationships between inputs. Machine learning models, especially generative\nAI models, are suited for creating embeddings by identifying patterns within\nlarge datasets. Applications can use embeddings to process and produce\nlanguage, recognizing complex meanings and semantic relationships specific to\nyour content. You interact with embeddings every time you complete a\nGoogle Search or see music streaming recommendations.\n\nEmbeddings work by converting text, image, and video into arrays of floating\npoint numbers, called vectors. These vectors are designed to capture the meaning\nof the text, images, and videos. The length of the embedding array is called the\nvector's dimensionality. For example, one passage of text might be represented\nby a vector containing hundreds of dimensions. Then, by calculating the\nnumerical distance between the vector representations of two pieces of text, an\napplication can determine the similarity between the objects.\n\nVertex AI supports two types of embeddings models, text and multimodal.\n\nText embeddings use cases\n-------------------------\n\nSome common use cases for text embeddings include:\n\n- **Semantic search**: Search text ranked by semantic similarity.\n- **Classification**: Return the class of items whose text attributes are similar to the given text.\n- **Clustering**: Cluster items whose text attributes are similar to the given text.\n- **Outlier Detection**: Return items where text attributes are least related to the given text.\n- **Conversational interface**: Clusters groups of sentences which can lead to similar responses, like in a conversation-level embedding space.\n\n### Example use case: Develop a book recommendation chatbot\n\nIf you want to develop a book recommendation chatbot, the first thing to do is\nto use a deep neural network (DNN) to convert each book into an embedding\nvector, where one embedding vector represents one book. You can feed, as input\nto the DNN, just the book title or just the text content. Or you can use both of\nthese together, along with any other metadata describing the book, such as the\ngenre.\n\nThe embeddings in this example could be comprised of thousands of book titles\nwith summaries and their genre, and it might have representations for books like\n*Wuthering Heights* by Emily Brontë and *Persuasion* by Jane Austen that are\nsimilar to each other (small distance between numerical representation). Whereas\nthe numerical representation for the book *The Great Gatsby* by F. Scott\nFitzgerald would be further, as the time period, genre, and summary is less\nsimilar.\n\nThe inputs are the main influence to the orientation of the embedding space. For\nexample, if we only had book title inputs, then two books with similar titles,\nbut very different summaries, could be close together. However, if we include\nthe title and summary, then these same books are less similar (further away) in\nthe embedding space.\n\nWorking with generative AI, this book-suggestion chatbot could summarize,\nsuggest, and show you books which you might like (or dislike), based on your\nquery.\n\nMultimodal embeddings use cases\n-------------------------------\n\nSome common use cases for multimodal embeddings include:\n\n- Image and text use cases:\n\n - **Image classification**: Takes an image as input and predicts one or more classes (labels).\n - **Image search**: Search relevant or similar images.\n - **Recommendations**: Generate product or ad recommendations based on images.\n- Image, text, and video use cases:\n\n - **Recommendations**: Generate product or advertisement recommendations based on videos (similarity search).\n - **Video content search**\n - **Using semantic search**: Take a text as an input, and return a set of ranked frames matching the query.\n - **Using similarity search** :\n - Take a video as an input, and return a set of videos matching the query.\n - Take an image as an input, and return a set of videos matching the query.\n - **Video classification**: Takes a video as input and predicts one or more classes.\n\n### Example use case: Online retail experience\n\nOnline retailers are increasingly leveraging multimodal embeddings to enhance\ncustomer experience. Every time you see personalized product recommendations\nwhile shopping, and get visual results from a text search, you are interacting\nwith an embedding.\n\nIf you want to create a multimodal embedding for an online retail use case,\nstart by processing each product image to generate a unique image embedding,\nwhich is a mathematical representation of its visual style, color palette, key\ndetails, and more. Simultaneously, convert product descriptions, customer\nreviews, and other relevant textual data into text embeddings that capture their\nsemantic meaning and context. By merging these image and text embeddings into a\nunified search and recommendation engine, the store can offer personalized\nrecommendations of visually similar items based on a customer's browsing history\nand preferences. Additionally, it enables customers to search for products using\nnatural language descriptions, with the engine retrieving and displaying the\nmost visually similar items that match their search query. For example, if a\ncustomer searches \"Black summer dress\", the search engine can display dresses\nwhich are black, and also are in summer dress cuts, made out of lighter\nmaterial, and might be sleeveless. This powerful combination of visual and\ntextual understanding creates a streamlined shopping experience that enhances\ncustomer engagement, satisfaction, and ultimately can drive sales.\n\nWhat's next\n-----------\n\n- To learn more about embeddings, see [Meet AI's multitool: Vector embeddings](https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings).\n- To take a foundational ML crash course on embeddings, see [Embeddings](https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture).\n- To learn more about how to store vector embeddings in a database, see the [Discover](/discover/what-is-a-vector-database) page and the [Overview of Vector Search](/vertex-ai/docs/vector-search/overview).\n- To learn about responsible AI best practices and Vertex AI's safety filters, see [Responsible AI](/vertex-ai/generative-ai/docs/learn/responsible-ai).\n- To learn how to get embeddings, see the following documents:\n - [Get text embeddings](/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)\n - [Get multimodal embeddings](/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings)"]]