Try Gemini 1.5 models, our newest multimodal models in Vertex AI, and see what you can build with a 1M token context window. Try Gemini 1.5 models, our newest multimodal models in Vertex AI, and see what you can build with a 1M token context window.

Get text embeddings

You can create a text embedding using the Vertex AI Text embeddings API. Text embeddings are numerical representations of text that capture relationships between words and phrases. Machine learning models, especially generative AI models, are suited for creating these embeddings by identifying patterns within large text datasets. Your application can use text embeddings to process and produce language, and recognize complex meanings and semantic relationships specific to your content. You interact with text embeddings every time you complete a Google Search or see music streaming recommendations.

Some common use cases for text embeddings include:

Semantic search: Search text ranked by semantic similarity.
Classification: Return the class of items whose text attributes are similar to the given text.
Clustering: Cluster items whose text attributes are similar to the given text.
Outlier Detection: Return items where text attributes are least related to the given text.
Conversational interface: Clusters groups of sentences which can lead to similar responses, like in a conversation-level embedding space.

Text embeddings work by converting text into arrays of floating point numbers, called vectors. These vectors are designed to capture the meaning of the text. The length of the embedding array is called the vector's dimensionality. For example, one passage of text might be represented by a vector containing hundreds of dimensions. Then, by calculating the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the objects.

Vertex AI text embeddings API uses dense vector representations: text-embedding-gecko, for example, uses 768-dimensional vectors. Dense vector embedding models use deep-learning methods similar to the ones used by large language models. Unlike sparse vectors, which tend to directly map words to numbers, dense vectors are designed to better represent the meaning of a piece of text. The benefit of using dense vector embeddings in generative AI is that instead of searching for direct word or syntax matches, you can better search for passages that align to the meaning of the query, even if the passages don't use the same language.

To learn more about embeddings, see Meet AI's multitool: Vector embeddings.
To take a foundational ML crash course on embeddings, see Embeddings.
To learn more about how to store vector embeddings in a database, see the Discover page and the Overview of Vector Search
To learn about text embedding models, see Text embeddings.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Enable the Vertex AI API.

Enable the API

Get text embeddings for a snippet of text

You can get text embeddings for a snippet of text by using the Vertex AI API or the Vertex AI SDK for Python. For each request, you're limited to 250 input texts in us-central1, and in other regions, the max input text is 5. Each input text has a token limit of 2048. Inputs longer than this length are silently truncated. You can also disable silent truncation by setting autoTruncate to false.

These examples use the text-embedding-004 model.

REST

To get text embeddings, send a POST request by specifying the model ID of the publisher model.

Before using any of the request data, make the following replacements:

PROJECT_ID: Your project ID.
TEXT: The text that you want to generate embeddings for. Limit: five texts of up to 2,048 tokens per text.
AUTO_TRUNCATE: If set to false, text that exceeds the token limit causes the request to fail. The default value is true.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict

Request JSON body:

{
  "instances": [
    { "content": "TEXT"}
  ],
  "parameters": { 
    "autoTruncate": AUTO_TRUNCATE 
  }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-embedding-004:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that values has been truncated to save space.

Response

{
  "predictions": [
    {
      "embeddings": {
        "statistics": {
          "truncated": false,
          "token_count": 6
        },
        "values": [ ... ]
      }
    }
  ]
}

Example curl command

MODEL_ID="text-embedding-004"
PROJECT_ID=PROJECT_ID

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/${MODEL_ID}:predict -d \
$'{
  "instances": [
    { "content": "What is life?"}
  ],
}'

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

from typing import List, Optional

from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel


def embed_text(
    texts: List[str] = ["banana muffins? ", "banana bread? banana muffins?"],
    task: str = "RETRIEVAL_DOCUMENT",
    model_name: str = "text-embedding-004",
    dimensionality: Optional[int] = 256,
) -> List[List[float]]:
    """Embeds texts with a pre-trained, foundational model."""
    model = TextEmbeddingModel.from_pretrained(model_name)
    inputs = [TextEmbeddingInput(text, task) for text in texts]
    kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {}
    embeddings = model.get_embeddings(inputs, **kwargs)
    return [embedding.values for embedding in embeddings]

Go

Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (
	"context"
	"fmt"
	"regexp"

	aiplatform "cloud.google.com/go/aiplatform/apiv1"
	"cloud.google.com/go/aiplatform/apiv1/aiplatformpb"

	"google.golang.org/api/option"
	"google.golang.org/protobuf/types/known/structpb"
)

func embedTexts(
	apiEndpoint, project, model string, texts []string,
	task string, customOutputDimensionality *int) ([][]float32, error) {
	ctx := context.Background()

	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	if err != nil {
		return nil, err
	}
	defer client.Close()

	match := regexp.MustCompile(`^(\w+-\w+)`).FindStringSubmatch(apiEndpoint)
	location := "us-central1"
	if match != nil {
		location = match[1]
	}
	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)
	instances := make([]*structpb.Value, len(texts))
	for i, text := range texts {
		instances[i] = structpb.NewStructValue(&structpb.Struct{
			Fields: map[string]*structpb.Value{
				"content":   structpb.NewStringValue(text),
				"task_type": structpb.NewStringValue(task),
			},
		})
	}
	outputDimensionality := structpb.NewNullValue()
	if customOutputDimensionality != nil {
		outputDimensionality = structpb.NewNumberValue(float64(*customOutputDimensionality))
	}
	params := structpb.NewStructValue(&structpb.Struct{
		Fields: map[string]*structpb.Value{"outputDimensionality": outputDimensionality},
	})

	req := &aiplatformpb.PredictRequest{
		Endpoint:   endpoint,
		Instances:  instances,
		Parameters: params,
	}
	resp, err := client.Predict(ctx, req)
	if err != nil {
		return nil, err
	}
	embeddings := make([][]float32, len(resp.Predictions))
	for i, prediction := range resp.Predictions {
		values := prediction.GetStructValue().Fields["embeddings"].GetStructValue().Fields["values"].GetListValue().Values
		embeddings[i] = make([]float32, len(values))
		for j, value := range values {
			embeddings[i][j] = float32(value.GetNumberValue())
		}
	}
	return embeddings, nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import static java.util.stream.Collectors.toList;

import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictRequest;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.protobuf.Struct;
import com.google.protobuf.Value;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.OptionalInt;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredictTextEmbeddingsSample {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Details about text embedding request structure and supported models are available in:
    // https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings
    String endpoint = "us-central1-aiplatform.googleapis.com:443";
    String project = "YOUR_PROJECT_ID";
    String model = "text-embedding-004";
    predictTextEmbeddings(
        endpoint,
        project,
        model,
        List.of("banana bread?", "banana muffins?"),
        "QUESTION_ANSWERING",
        OptionalInt.of(256));
  }

  // Gets text embeddings from a pretrained, foundational model.
  public static List<List<Float>> predictTextEmbeddings(
      String endpoint,
      String project,
      String model,
      List<String> texts,
      String task,
      OptionalInt outputDimensionality)
      throws IOException {
    PredictionServiceSettings settings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();
    Matcher matcher = Pattern.compile("^(?<Location>\\w+-\\w+)").matcher(endpoint);
    String location = matcher.matches() ? matcher.group("Location") : "us-central1";
    EndpointName endpointName =
        EndpointName.ofProjectLocationPublisherModelName(project, location, "google", model);

    // You can use this prediction service client for multiple requests.
    try (PredictionServiceClient client = PredictionServiceClient.create(settings)) {
      PredictRequest.Builder request =
          PredictRequest.newBuilder().setEndpoint(endpointName.toString());
      if (outputDimensionality.isPresent()) {
        request.setParameters(
            Value.newBuilder()
                .setStructValue(
                    Struct.newBuilder()
                        .putFields("outputDimensionality", valueOf(outputDimensionality.getAsInt()))
                        .build()));
      }
      for (int i = 0; i < texts.size(); i++) {
        request.addInstances(
            Value.newBuilder()
                .setStructValue(
                    Struct.newBuilder()
                        .putFields("content", valueOf(texts.get(i)))
                        .putFields("taskType", valueOf(task))
                        .build()));
      }
      PredictResponse response = client.predict(request.build());
      List<List<Float>> floats = new ArrayList<>();
      for (Value prediction : response.getPredictionsList()) {
        Value embeddings = prediction.getStructValue().getFieldsOrThrow("embeddings");
        Value values = embeddings.getStructValue().getFieldsOrThrow("values");
        floats.add(
            values.getListValue().getValuesList().stream()
                .map(Value::getNumberValue)
                .map(Double::floatValue)
                .collect(toList()));
      }
      return floats;
    }
  }

  private static Value valueOf(String s) {
    return Value.newBuilder().setStringValue(s).build();
  }

  private static Value valueOf(int n) {
    return Value.newBuilder().setNumberValue(n).build();
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

async function main(
  project,
  model = 'text-embedding-004',
  texts = 'banana bread?;banana muffins?',
  task = 'QUESTION_ANSWERING',
  outputDimensionality = 0,
  apiEndpoint = 'us-central1-aiplatform.googleapis.com'
) {
  const aiplatform = require('@google-cloud/aiplatform');
  const {PredictionServiceClient} = aiplatform.v1;
  const {helpers} = aiplatform; // helps construct protobuf.Value objects.
  const clientOptions = {apiEndpoint: apiEndpoint};
  const location = 'us-central1';
  const endpoint = `projects/${project}/locations/${location}/publishers/google/models/${model}`;
  const parameters =
    outputDimensionality > 0
      ? helpers.toValue(outputDimensionality)
      : helpers.toValue(256);

  async function callPredict() {
    const instances = texts
      .split(';')
      .map(e => helpers.toValue({content: e, taskType: task}));
    const request = {endpoint, instances, parameters};
    const client = new PredictionServiceClient(clientOptions);
    const [response] = await client.predict(request);
    console.log('Got predict response');
    const predictions = response.predictions;
    for (const prediction of predictions) {
      const embeddings = prediction.structValue.fields.embeddings;
      const values = embeddings.structValue.fields.values.listValue.values;
      console.log('Got prediction: ' + JSON.stringify(values));
    }
  }

  callPredict();
}

Add an embedding to a vector database

After you've generated your embedding you can add embeddings to a vector database, like Vector Search. This enables low-latency retrieval, and is critical as the size of your data increases.

To learn more about Vector Search, see Overview of Vector Search.

Example use case: Develop a book recommendation chatbot

If you want to develop a book recommendation chatbot, the first thing to do is to use a deep neural network (DNN) to convert each book into an embedding vector, where one embedding vector represents one book. You can feed, as input to the DNN, just the book title or just the text content. Or you can use both of these together, along with any other metadata describing the book, such as the genre.

The embeddings in this example could be comprised of thousands of book titles with summaries and their genre, and it might have representations for books like Wuthering Heights by Emily Brontë and Persuasion by Jane Austen that are similar to each other (small distance between numerical representation). Whereas the numerical representation for the book The Great Gatsby by F. Scott Fitzgerald would be further, as the time period, genre, and summary is less similar.

The inputs are the main influence to the orientation of the embedding space. For example, if we only had book title inputs, then two books with similar titles, but very different summaries, could be close together. However, if we include the title and summary, then these same books are less similar (further away) in the embedding space.

Working with generative AI, this book-suggestion chatbot could summarize, suggest, and show you books which you might like (or dislike), based on your query.

API changes to models released on or after August 2023

When using model versions released on or after August 2023, including text-embedding-004 and textembedding-gecko-multilingual@001, there is a new task type parameter and the optional title (only valid with task_type=RETRIEVAL_DOCUMENT).

These new parameters apply to these public preview models and all stable models going forward.

{
  "instances": [
    {
      "task_type": "RETRIEVAL_DOCUMENT",
      "title": "document title",
      "content": "I would like embeddings for this text!"
    },
  ]
}

The task_type parameter is defined as the intended downstream application to help the model produce better quality embeddings. It is a string that can take on one of the following values:

`task_type`	Description
`RETRIEVAL_QUERY`	Specifies the given text is a query in a search or retrieval setting.
`RETRIEVAL_DOCUMENT`	Specifies the given text is a document in a search or retrieval setting.
`SEMANTIC_SIMILARITY`	Specifies the given text is used for Semantic Textual Similarity (STS).
`CLASSIFICATION`	Specifies that the embedding is used for classification.
`CLUSTERING`	Specifies that the embedding is used for clustering.
`QUESTION_ANSWERING`	Specifies that the query embedding is used for answering questions. Use RETRIEVAL_DOCUMENT for the document side.
`FACT_VERIFICATION`	Specifies that the query embedding is used for fact verification.

What's next

To get batch predictions for embeddings, see Get batch text embeddings predictions
To learn more about multimodal embeddings, see Get multimodal embeddings
To tune an embedding, see Tune text embeddings