This document describes how to create a text embedding using the Vertex AI Text embeddings API.
Vertex AI text embeddings API uses dense vector representations: text-embedding-gecko, for example, uses 768-dimensional vectors. Dense vector embedding models use deep-learning methods similar to the ones used by large language models. Unlike sparse vectors, which tend to directly map words to numbers, dense vectors are designed to better represent the meaning of a piece of text. The benefit of using dense vector embeddings in generative AI is that instead of searching for direct word or syntax matches, you can better search for passages that align to the meaning of the query, even if the passages don't use the same language.
The vectors are normalized, so you can use cosine similarity, dot product, or Euclidean distance to provide the same similarity rankings.
- To learn more about embeddings, see the embeddings APIs overview.
- To learn about text embedding models, see Text embeddings.
- For information about which languages each embeddings model supports, see Supported text languages.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Vertex AI API.
- Choose a task type for your embeddings job.
Supported models
You can get text embeddings by using the following models:
English models | Multilingual models |
---|---|
textembedding-gecko@001 |
textembedding-gecko-multilingual@001 |
textembedding-gecko@003 |
text-multilingual-embedding-002 |
text-embedding-004 |
|
text-embedding-005 |
If you are new to these models, we recommend that you use the latest versions.
For English text, use text-embedding-005
. For multilingual text, use
text-multilingual-embedding-002
.
Get text embeddings for a snippet of text
You can get text embeddings for a snippet of text by using the Vertex AI API or
the Vertex AI SDK for Python. For each request, you're limited to 250 input texts
in us-central1
, and in other regions, the max input text is 5.
The API has a maximum input token limit of 20,000. Inputs exceeding this limit
results in a 500 error. Each individual input text is further limited to
2048 tokens; any excess is silently truncated. You can also disable silent
truncation by setting autoTruncate
to false
.
All models produce an output with 768 dimensions by default. However, the following models give users the option to choose an output dimensionality between 1 and 768. By selecting a smaller output dimensionality, users can save memory and storage space, leading to more efficient computations.
text-embedding-005
text-multilingual-embedding-002
The following examples use the text-embedding-004
model.
To get text embeddings, send a POST request by specifying the model ID of the publisher model.
Before using any of the request data, make the following replacements:
PROJECT_ID : Your project ID.TEXT : The text that you want to generate embeddings for. Limit: five texts of up to 2,048 tokens per text for all models excepttextembedding-gecko@001
. The max input token length fortextembedding-gecko@001
is 3072.AUTO_TRUNCATE : If set tofalse
, text that exceeds the token limit causes the request to fail. The default value istrue
.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID /locations/us-central1/publishers/google/models/text-embedding-004:predict
Request JSON body:
{ "instances": [ { "content": "TEXT "} ], "parameters": { "autoTruncate":AUTO_TRUNCATE } }
To send your request, choose one of these options:
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID /locations/us-central1/publishers/google/models/text-embedding-004:predict"
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID /locations/us-central1/publishers/google/models/text-embedding-004:predict" | Select-Object -Expand Content
You should receive a JSON response similar to the following. Note that values
has been truncated to save space.
Response
{ "predictions": [ { "embeddings": { "statistics": { "truncated": false, "token_count": 6 }, "values": [ ... ] } } ] }
Example curl command
MODEL_ID="text-embedding-004"
PROJECT_ID=PROJECT_ID
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID /locations/us-central1/publishers/google/models/${MODEL_ID}:predict -d \
$'{
"instances": [
{ "content": "What is life?"}
],
}'
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Latest model
text-embedding-005
supports a new task type CODE_RETRIEVAL_QUERY
, which can be used to
retrieve relevant code blocks using plain text queries. To use this feature, code
blocks should be be embedded using the RETRIEVAL_DOCUMENT
task type, while text
queries embedded using CODE_RETRIEVAL_QUERY
.
To explore all task types, see the model reference.
Here is an example:
PROJECT_ID=PROJECT_ID
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID /locations/us-central1/publishers/google/models/text-embedding-005:predict -d \
$'{
"instances": [
{
"task_type": "CODE_RETRIEVAL_QUERY",
"content": "Function to add two numbers"
}
],
}'
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The following limitations apply when using these models:
- Don't use these preview models on mission critical or production systems.
- These models are available in
us-central1
only. - Batch predictions are not supported.
- Customization is not supported.
Add an embedding to a vector database
After you've generated your embedding you can add embeddings to a vector database, like Vector Search. This enables low-latency retrieval, and is critical as the size of your data increases.
To learn more about Vector Search, see Overview of Vector Search.
What's next
- To learn more about rate limits, see Generative AI on Vertex AI rate limits.
- To get batch predictions for embeddings, see Get batch text embeddings predictions
- To learn more about multimodal embeddings, see Get multimodal embeddings
- To tune an embedding, see Tune text embeddings
- To learn more about the research behind
text-embedding-005
andtext-multilingual-embedding-002
, see the research paper Gecko: Versatile Text Embeddings Distilled from Large Language Models.