This guide provides information about optional metadata for vector embeddings. Vector Search lets you define metadata for each embedding.
Metadata is non-filterable, arbitrary information that Vector Search can store for each embedding. This can provide embeddings with useful context such as:
Product details, such as name, price, and an image URL.
Descriptions, snippets, dates, and authorship for text embeddings.
User information for user embeddings.
Coordinates for place embeddings.
Key features and benefits
Features and benefits of using metadata include:
Context with results: Information can be provided directly in your search results, which eliminates the need for separate lookups and reduces latency.
Flexible structure: Metadata is provided as a JSON object, which allows the metadata to be defined as complex, nested data.
Non-Filterable: Vector embedding metadata is for storing and retrieving non-filterable information that's distinct from
restrictsandnumeric_restricts.Efficient updates: The
update_maskfield lets you specify that APIs only update metadata to avoid resubmitting embedding vectors.Decoupled Information: Non-filterable information can be separated from filterable attributes like
restricts.Streamlined development: Search responses include metadata associated with a vector embedding, while reducing the complexity needed for features such as displaying rich search results and performing context-based post-processing.
Data format
An optional embedding_metadata field holds a JSON object that
flexibly associates rich, non-filterable information with embeddings in
Vector Search. This can streamline applications by returning context
with results and allows efficient metadata-only updates using update_mask
for the upsertDatapoints API.
Example data point structure:
{
"id": "movie_001",
"embedding": [0.1, 0.2, ..., 0.3],
"sparse_embedding": {
"values": [-0.4, 0.2, -1.3],
"dimensions": [10, 20, 30]
},
"numeric_restricts": [{'namespace': 'year', 'value_int': 2022}],
"restricts": [{'namespace': 'genre', 'allow': ['action', 'comedy']}],
# --- New embedding_metadata field ---
"embedding_metadata": {
"title": "Ballet Train",
"runtime": {
"hours": 2,
"minutes": 6
},
"review_info": {
"review": "This movie is fun and...",
"rotten_potatoes_rating": 76
}
}
# ------------------------------------
},
# ... other data points
Ingesting data with embedding_metadata
When adding data points, you can include embedding_metadata when one of the
following actions occurs:
- Uploading a file (Cloud Storage):
- Use JSON or AVRO formats.
CSV isn't supported for
embedding_metadata.
- Use JSON or AVRO formats.
CSV isn't supported for
- Using the
upsertDatapointsAPI:- Pass data point objects (including
embedding_metadata) in the API request payload.
- Pass data point objects (including
Retrieving embedding_metadata during queries
When performing a standard nearest-neighbor search using the
findNeighbors API, the embedding_metadata field for each neighbor is
automatically included in the response if returnFullDatapoint
is set to True.
curl
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${PUBLIC_ENDPOINT_DOMAIN}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/indexEndpoints/${INDEX_ENDPOINT_ID}:findNeighbors" \
-d '{deployedIndexId:"${DEPLOYED_INDEX_ID}", "queries":[{datapoint:{"featureVector":"<FEATURE_VECTOR>"}}], returnFullDatapoint:true}'
Updating embedding_metadata
Update metadata using the upsertDatapoints API and an update_mask
using the value embedding_metadata. The update_mask field might also
include additional mask values. For uses of a field mask, see
Update embedding metadata.
The update_mask field helps to ensure that only embedding_metadata is updated,
avoiding resubmission of restrict and embedding fields.
The following example demonstrates how to define and update metadata to create a
targeted IndexDatapoint, specifying update_mask, and calling
upsertDatapoints.
curl
curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/indexes/${INDEX_ID}:upsertDatapoints \
-d '{
datapoints:[
{
datapoint_id: "'${DATAPOINT_ID_1}'",
feature_vector: [...],
embedding_metadata:{"title": "updated title", "rating": 4.5, "tags": ["updated", "reviewed"]
}, update_mask: "embedding_metadata"}'