You can create a text embedding using the Vertex AI Text embeddings API. Text embeddings are numerical representations of text that capture relationships between words and phrases. Machine learning models, especially generative AI models, are suited for creating these embeddings by identifying patterns within large text datasets. Your application can use text embeddings to process and produce language, and recognize complex meanings and semantic relationships specific to your content. You interact with text embeddings every time you complete a Google Search or see music streaming recommendations.
Some common use cases for text embeddings include:
- Semantic search: Search text ranked by semantic similarity.
- Classification: Return the class of items whose text attributes are similar to the given text.
- Clustering: Cluster items whose text attributes are similar to the given text.
- Outlier Detection: Return items where text attributes are least related to the given text.
- Conversational interface: Clusters groups of sentences which can lead to similar responses, like in a conversation-level embedding space.
Text embeddings work by converting text into arrays of floating point numbers, called vectors. These vectors are designed to capture the meaning of the text. The length of the embedding array is called the vector's dimensionality. For example, one passage of text might be represented by a vector containing hundreds of dimensions. Then, by calculating the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the objects.
Vertex AI text embeddings API uses dense vector representations: text-embedding-gecko, for example, uses 768-dimensional vectors. Dense vector embedding models use deep-learning methods similar to the ones used by large language models. Unlike sparse vectors, which tend to directly map words to numbers, dense vectors are designed to better represent the meaning of a piece of text. The benefit of using dense vector embeddings in generative AI is that instead of searching for direct word or syntax matches, you can better search for passages that align to the meaning of the query, even if the passages don't use the same language.
- To learn more about embeddings, see Meet AI's multitool: Vector embeddings.
- To take a foundational ML crash course on embeddings, see Embeddings.
- To learn more about how to store vector embeddings in a database, see the Discover page and the Overview of Vector Search
- To learn about text embedding models, see Text embeddings.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Vertex AI API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Vertex AI API.
Get text embeddings for a snippet of text
You can get text embeddings for a snippet of text by using the Vertex AI API or
the Vertex AI SDK for Python. For each request, you're limited to 250 input texts
in us-central1
, and in other regions, the max input text is 5.
Each input text has a token limit of 2048. Inputs longer than this length are
silently truncated. You can also disable silent truncation by setting
autoTruncate
to false
.
These examples use the text-embedding-004
model.
REST
To get text embeddings, send a POST request by specifying the model ID of the publisher model.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- TEXT: The text that you want to generate embeddings for. Limit: five texts of up to 2,048 tokens per text.
- AUTO_TRUNCATE: If set to
false
, text that exceeds the token limit causes the request to fail. The default value istrue
.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict
Request JSON body:
{ "instances": [ { "content": "TEXT"} ], "parameters": { "autoTruncate": AUTO_TRUNCATE } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict" | Select-Object -Expand Content
You should receive a JSON response similar to the following. Note that values
has been truncated to save space.
Example curl command
MODEL_ID="text-embedding-004"
PROJECT_ID=PROJECT_ID
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/${MODEL_ID}:predict -d \
$'{
"instances": [
{ "content": "What is life?"}
],
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Add an embedding to a vector database
After you've generated your embedding you can add embeddings to a vector database, like Vector Search. This enables low-latency retrieval, and is critical as the size of your data increases.
To learn more about Vector Search, see Overview of Vector Search.
Example use case: Develop a book recommendation chatbot
If you want to develop a book recommendation chatbot, the first thing to do is to use a deep neural network (DNN) to convert each book into an embedding vector, where one embedding vector represents one book. You can feed, as input to the DNN, just the book title or just the text content. Or you can use both of these together, along with any other metadata describing the book, such as the genre.
The embeddings in this example could be comprised of thousands of book titles with summaries and their genre, and it might have representations for books like Wuthering Heights by Emily Brontë and Persuasion by Jane Austen that are similar to each other (small distance between numerical representation). Whereas the numerical representation for the book The Great Gatsby by F. Scott Fitzgerald would be further, as the time period, genre, and summary is less similar.
The inputs are the main influence to the orientation of the embedding space. For example, if we only had book title inputs, then two books with similar titles, but very different summaries, could be close together. However, if we include the title and summary, then these same books are less similar (further away) in the embedding space.
Working with generative AI, this book-suggestion chatbot could summarize, suggest, and show you books which you might like (or dislike), based on your query.
API changes to models released on or after August 2023
When using model versions released on or after August 2023, including
text-embedding-004
and textembedding-gecko-multilingual@001
,
there is a new task type parameter and the optional title (only valid with
task_type=RETRIEVAL_DOCUMENT
).
These new parameters apply to these public preview models and all stable models going forward.
{
"instances": [
{
"task_type": "RETRIEVAL_DOCUMENT",
"title": "document title",
"content": "I would like embeddings for this text!"
},
]
}
The task_type
parameter is defined as the intended downstream application to
help the model produce better quality embeddings. It is a string that can take on
one of the following values:
task_type |
Description |
---|---|
RETRIEVAL_QUERY |
Specifies the given text is a query in a search or retrieval setting. |
RETRIEVAL_DOCUMENT |
Specifies the given text is a document in a search or retrieval setting. |
SEMANTIC_SIMILARITY |
Specifies the given text is used for Semantic Textual Similarity (STS). |
CLASSIFICATION |
Specifies that the embedding is used for classification. |
CLUSTERING |
Specifies that the embedding is used for clustering. |
QUESTION_ANSWERING |
Specifies that the query embedding is used for answering questions. Use RETRIEVAL_DOCUMENT for the document side. |
FACT_VERIFICATION |
Specifies that the query embedding is used for fact verification. |
What's next
- To get batch predictions for embeddings, see Get batch text embeddings predictions
- To learn more about multimodal embeddings, see Get multimodal embeddings
- To tune an embedding, see Tune text embeddings