Vector Search embeddings with metadata

This guide provides information about optional metadata for vector embeddings. Vector Search lets you define metadata for each embedding.

Metadata is non-filterable, arbitrary information that Vector Search can store for each embedding. This can provide embeddings with useful context such as:

  • Product details, such as name, price, and an image URL.

  • Descriptions, snippets, dates, and authorship for text embeddings.

  • User information for user embeddings.

  • Coordinates for place embeddings.

Key features and benefits

Features and benefits of using metadata include:

  • Context with results: Information can be provided directly in your search results, which eliminates the need for separate lookups and reduces latency.

  • Flexible structure: Metadata is provided as a JSON object, which allows the metadata to be defined as complex, nested data.

  • Non-Filterable: Vector embedding metadata is for storing and retrieving non-filterable information that's distinct from restricts and numeric_restricts.

  • Efficient updates: The update_mask field lets you specify that APIs only update metadata to avoid resubmitting embedding vectors.

  • Decoupled Information: Non-filterable information can be separated from filterable attributes like restricts.

  • Streamlined development: Search responses include metadata associated with a vector embedding, while reducing the complexity needed for features such as displaying rich search results and performing context-based post-processing.

Data format

An optional embedding_metadata field holds a JSON object that flexibly associates rich, non-filterable information with embeddings in Vector Search. This can streamline applications by returning context with results and allows efficient metadata-only updates using update_mask for the upsertDatapoints API.

Example data point structure:

    {
        "id": "movie_001",
        "embedding": [0.1, 0.2, ..., 0.3],
        "sparse_embedding": {
            "values": [-0.4, 0.2, -1.3],
            "dimensions": [10, 20, 30]
        },
        "numeric_restricts": [{'namespace': 'year', 'value_int': 2022}],
        "restricts": [{'namespace': 'genre', 'allow': ['action', 'comedy']}],

        # --- New embedding_metadata field ---
        "embedding_metadata": {
            "title": "Ballet Train",
            "runtime": {
                "hours": 2,
                "minutes": 6
            },
            "review_info": {
                "review": "This movie is fun and...",
                "rotten_potatoes_rating": 76
            }
        }
        # ------------------------------------
    },
    # ... other data points

Ingesting data with embedding_metadata

When adding data points, you can include embedding_metadata when one of the following actions occurs:

  • Uploading a file (Cloud Storage):
    • Use JSON or AVRO formats. CSV isn't supported for embedding_metadata.
  • Using the upsertDatapoints API:
    • Pass data point objects (including embedding_metadata) in the API request payload.

Retrieving embedding_metadata during queries

When performing a standard nearest-neighbor search using the findNeighbors API, the embedding_metadata field for each neighbor is automatically included in the response if returnFullDatapoint is set to True.

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${PUBLIC_ENDPOINT_DOMAIN}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/indexEndpoints/${INDEX_ENDPOINT_ID}:findNeighbors" \
-d '{deployedIndexId:"${DEPLOYED_INDEX_ID}", "queries":[{datapoint:{"featureVector":"<FEATURE_VECTOR>"}}], returnFullDatapoint:true}'

Updating embedding_metadata

Update metadata using the upsertDatapoints API and an update_mask using the value embedding_metadata. The update_mask field might also include additional mask values. For uses of a field mask, see Update embedding metadata.

The update_mask field helps to ensure that only embedding_metadata is updated, avoiding resubmission of restrict and embedding fields.

The following example demonstrates how to define and update metadata to create a targeted IndexDatapoint, specifying update_mask, and calling upsertDatapoints.

curl

curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/indexes/${INDEX_ID}:upsertDatapoints \
-d '{
datapoints:[
    {
        datapoint_id: "'${DATAPOINT_ID_1}'",
        feature_vector: [...],
        embedding_metadata:{"title": "updated title", "rating": 4.5, "tags": ["updated", "reviewed"]
    }, update_mask: "embedding_metadata"}'