Get text embeddings

The Vertex AI text-embeddings API lets you create a text embedding using Generative AI on Vertex AI. Text embeddings are numerical representations of text that capture relationships between words and phrases. Machine learning models, especially generative AI models, are suited for creating these embeddings by identifying patterns within large text datasets. Your application can use text embeddings to process and produce language, recognizing complex meanings and semantic relationships specific to your content. You interact with text embeddings every time you complete a Google Search or see music streaming recommendations.

Some common use cases for text embeddings include:

  • Semantic search: Search text ranked by semantic similarity.
  • Classification: Return the class of items whose text attributes are similar to the given text.
  • Clustering: Cluster items whose text attributes are similar to the given text.
  • Outlier Detection: Return items where text attributes are least related to the given text.
  • Conversational interface: Clusters groups of sentences which can lead to similar responses, like in a conversation-level embedding space.

Text embeddings work by converting text into arrays of floating point numbers, called vectors. These vectors are designed to capture the meaning of the text. The length of the embedding array is called the vector's dimensionality. For example, one passage of text might be represented by a vector containing hundreds of dimensions. Then, by calculating the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the objects.

Vertex AI text embeddings API uses dense vector representations: text-embedding-gecko, for example, uses 768-dimensional vectors. Dense vector embedding models use deep-learning methods similar to the ones used by large language models. Unlike sparse vectors, which tend to directly map words to numbers, dense vectors are designed to better represent the meaning of a piece of text. The benefit of using dense vector embeddings in generative AI is that instead of searching for direct word or syntax matches, you can better search for passages that align to the meaning of the query, even if the passages don't use the same language.


There are specific prerequisites for successfully creating an embedding. To get started, see quickstart: Try text embeddings.

Use this colab to call the newly released text embedding models (textembedding-gecko and textembedding-gecko-multilingual).

Jupyter notebook: Call the text embedding models using Colab or a Jupyter notebook.
Run in Colab

Example use case: Develop a book recommendation chatbot

If you want to develop a book recommendation chatbot, the first thing to do is to use a deep neural network (DNN) to convert each book into an embedding vector, where one embedding vector represents one book. You can feed, as input to the DNN, just the book title or just the text content. Or you can use both of these together, along with any other metadata describing the book, such as the genre.

The embeddings in this example could be comprised of thousands of book titles with summaries and their genre, and it might have representations for books like Wuthering Heights by Emily Brontë and Persuasion by Jane Austen that are similar to each other (small distance between numerical representation). Whereas the numerical representation for the book The Great Gatsby by F. Scott Fitzgerald would be further, as the time period, genre, and summary is less similar.

The inputs are the main influence to the orientation of the embedding space. For example, if we only had book title inputs, then two books with similar titles, but very different summaries, could be close together. However, if we include the title and summary, then these same books are less similar (further away) in the embedding space.

Working with generative AI, this book-suggestion chatbot could summarize, suggest, and show you books which you might like (or dislike), based on your query.

Supported models

To learn which stable text embedding model versions are available, see Available stable model versions. To learn which latest text embedding model versions are available, see Latest models.

It is strongly recommended to specify a stable model version (for example, text-embedding-004). The latest version of a model is in Preview and is not General Availability (GA). Because the latest version is in Preview, it isn't promised to be production ready.

It is especially important to use a stable model version for example,text-embedding-004 for applications that require backward compatible embeddings. If backward compatibility isn't a concern and you would like to use the latest model version, you should specify @latest explicitly. Always specify the full model name, including the version number.

Get text embeddings for a snippet of text

You can get text embeddings for a snippet of text by using the Vertex AI API or the Vertex AI SDK for Python. For each request, you're limited to 250 input texts in us-central1, and in other regions, the max input text is 5. Each input text has a token limit of 2,000. Inputs longer than this length are silently truncated. You can also disable silent truncation by setting autoTruncate to false.

These examples use the text-embedding-004 model.


To get text embeddings, send a POST request by specifying the model ID of the publisher model.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TEXT: The text that you want to generate embeddings for. Limit: five texts of up to 3,072 tokens per text.
  • AUTO_TRUNCATE: If set to false, text that exceeds the token limit causes the request to fail. The default value is true.

HTTP method and URL:


Request JSON body:

  "instances": [
    { "content": "TEXT"}
  "parameters": { 
    "autoTruncate": AUTO_TRUNCATE 

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that values has been truncated to save space.

Example curl command


curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \${MODEL_ID}:predict -d \
  "instances": [
    { "content": "What is life?"}


To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

from typing import List, Optional

from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel

def embed_text(
    texts: List[str] = ["banana muffins? ", "banana bread? banana muffins?"],
    task: str = "RETRIEVAL_DOCUMENT",
    model_name: str = "text-embedding-004",
    dimensionality: Optional[int] = 256,
) -> List[List[float]]:
    """Embeds texts with a pre-trained, foundational model."""
    model = TextEmbeddingModel.from_pretrained(model_name)
    inputs = [TextEmbeddingInput(text, task) for text in texts]
    kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {}
    embeddings = model.get_embeddings(inputs, **kwargs)
    return [embedding.values for embedding in embeddings]


Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (

	aiplatform ""


func embedTexts(
	apiEndpoint, project, model string, texts []string,
	task string, customOutputDimensionality *int) ([][]float32, error) {
	ctx := context.Background()

	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		return nil, err
	defer client.Close()

	match := regexp.MustCompile(`^(\w+-\w+)`).FindStringSubmatch(apiEndpoint)
	location := "us-central1"
	if match != nil {
		location = match[1]
	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)
	instances := make([]*structpb.Value, len(texts))
	for i, text := range texts {
		instances[i] = structpb.NewStructValue(&structpb.Struct{
			Fields: map[string]*structpb.Value{
				"content":   structpb.NewStringValue(text),
				"task_type": structpb.NewStringValue(task),
	outputDimensionality := structpb.NewNullValue()
	if customOutputDimensionality != nil {
		outputDimensionality = structpb.NewNumberValue(float64(*customOutputDimensionality))
	params := structpb.NewStructValue(&structpb.Struct{
		Fields: map[string]*structpb.Value{"outputDimensionality": outputDimensionality},

	req := &aiplatformpb.PredictRequest{
		Endpoint:   endpoint,
		Instances:  instances,
		Parameters: params,
	resp, err := client.Predict(ctx, req)
	if err != nil {
		return nil, err
	embeddings := make([][]float32, len(resp.Predictions))
	for i, prediction := range resp.Predictions {
		values := prediction.GetStructValue().Fields["embeddings"].GetStructValue().Fields["values"].GetListValue().Values
		embeddings[i] = make([]float32, len(values))
		for j, value := range values {
			embeddings[i][j] = float32(value.GetNumberValue())
	return embeddings, nil


Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import static;

import java.util.ArrayList;
import java.util.List;
import java.util.OptionalInt;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredictTextEmbeddingsSample {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Details about text embedding request structure and supported models are available in:
    String endpoint = "";
    String project = "YOUR_PROJECT_ID";
    String model = "text-embedding-004";
        List.of("banana bread?", "banana muffins?"),

  // Gets text embeddings from a pretrained, foundational model.
  public static List<List<Float>> predictTextEmbeddings(
      String endpoint,
      String project,
      String model,
      List<String> texts,
      String task,
      OptionalInt outputDimensionality)
      throws IOException {
    PredictionServiceSettings settings =
    Matcher matcher = Pattern.compile("^(?<Location>\\w+-\\w+)").matcher(endpoint);
    String location = matcher.matches() ?"Location") : "us-central1";
    EndpointName endpointName =
        EndpointName.ofProjectLocationPublisherModelName(project, location, "google", model);

    // You can use this prediction service client for multiple requests.
    try (PredictionServiceClient client = PredictionServiceClient.create(settings)) {
      PredictRequest.Builder request =
      if (outputDimensionality.isPresent()) {
                        .putFields("outputDimensionality", valueOf(outputDimensionality.getAsInt()))
      for (int i = 0; i < texts.size(); i++) {
                        .putFields("content", valueOf(texts.get(i)))
                        .putFields("taskType", valueOf(task))
      PredictResponse response = client.predict(;
      List<List<Float>> floats = new ArrayList<>();
      for (Value prediction : response.getPredictionsList()) {
        Value embeddings = prediction.getStructValue().getFieldsOrThrow("embeddings");
        Value values = embeddings.getStructValue().getFieldsOrThrow("values");
      return floats;

  private static Value valueOf(String s) {
    return Value.newBuilder().setStringValue(s).build();

  private static Value valueOf(int n) {
    return Value.newBuilder().setNumberValue(n).build();


Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

async function main(
  model = 'text-embedding-004',
  texts = 'banana bread?;banana muffins?',
  outputDimensionality = 0,
  apiEndpoint = ''
) {
  const aiplatform = require('@google-cloud/aiplatform');
  const {PredictionServiceClient} = aiplatform.v1;
  const {helpers} = aiplatform; // helps construct protobuf.Value objects.
  const clientOptions = {apiEndpoint: apiEndpoint};
  const location = 'us-central1';
  const endpoint = `projects/${project}/locations/${location}/publishers/google/models/${model}`;
  const parameters =
    outputDimensionality > 0
      ? helpers.toValue(outputDimensionality)
      : helpers.toValue(256);

  async function callPredict() {
    const instances = texts
      .map(e => helpers.toValue({content: e, taskType: task}));
    const request = {endpoint, instances, parameters};
    const client = new PredictionServiceClient(clientOptions);
    const [response] = await client.predict(request);
    console.log('Got predict response');
    const predictions = response.predictions;
    for (const prediction of predictions) {
      const embeddings = prediction.structValue.fields.embeddings;
      const values = embeddings.structValue.fields.values.listValue.values;
      console.log('Got prediction: ' + JSON.stringify(values));


Add an embedding to a vector database

After you've generated your embedding you can add embeddings to a vector database, like Vector Search. This enables low-latency retrieval, and is critical as the size of your data increases.

To learn more about Vector Search, see Overview of Vector Search.

API changes to models released on or after August 2023

When using model versions released on or after August 2023, including text-embedding-004 and textembedding-gecko-multilingual@001, there is a new task type parameter and the optional title (only valid with task_type=RETRIEVAL_DOCUMENT).

These new parameters apply to these public preview models and all stable models going forward.

  "instances": [
      "task_type": "RETRIEVAL_DOCUMENT",
      "title": "document title",
      "content": "I would like embeddings for this text!"

The task_type parameter is defined as the intended downstream application to help the model produce better quality embeddings. It is a string that can take on one of the following values:

task_type Description
RETRIEVAL_QUERY Specifies the given text is a query in a search or retrieval setting.
RETRIEVAL_DOCUMENT Specifies the given text is a document in a search or retrieval setting.
SEMANTIC_SIMILARITY Specifies the given text is used for Semantic Textual Similarity (STS).
CLASSIFICATION Specifies that the embedding is used for classification.
CLUSTERING Specifies that the embedding is used for clustering.
QUESTION_ANSWERING Specifies that the query embedding is used for answering questions. Use RETRIEVAL_DOCUMENT for the document side.
FACT_VERIFICATION Specifies that the query embedding is used for fact verification.

Language coverage for textembedding-gecko-multilingual models.

The textembedding-gecko-multilingual@001 model has been evaluated on the following languages: Arabic (ar), Bengali (bn), English (en), Spanish (es), German (de), Persian (fa), Finnish (fi), French (fr), Hindi (hi), Indonesian (id), Japanese (ja), Korean (ko), Russian (ru), Swahili (sw), Telugu (te), Thai (th), Yoruba (yo), Chinese (zh).

The following is the full list of supported languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

What's next