Send feedback
Class MultiModalEmbeddingModel (1.95.1)
Stay organized with collections
Save and categorize content based on your preferences.
Version latestkeyboard_arrow_down
MultiModalEmbeddingModel ( model_id : str , endpoint_name : typing . Optional [ str ] = None )
Generates embedding vectors from images and videos.
Examples::
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")
embeddings = model.get_embeddings(
image=image,
video=video,
contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding
Methods
MultiModalEmbeddingModel
MultiModalEmbeddingModel ( model_id : str , endpoint_name : typing . Optional [ str ] = None )
Creates a _ModelGardenModel.
This constructor should not be called directly.
Use {model_class}.from_pretrained(model_name=...)
instead.
from_pretrained
from_pretrained ( model_name : str ) - > vertexai . _model_garden . _model_garden_models . T
Loads a _ModelGardenModel.
Exceptions
Type
Description
ValueError
If model_name is unknown.
ValueError
If model does not support this class.
get_embeddings
get_embeddings (
image : typing . Optional [ vertexai . vision_models . Image ] = None ,
video : typing . Optional [ vertexai . vision_models . Video ] = None ,
contextual_text : typing . Optional [ str ] = None ,
dimension : typing . Optional [ int ] = None ,
video_segment_config : typing . Optional [
vertexai . vision_models . VideoSegmentConfig
] = None ,
) - > vertexai . vision_models . MultiModalEmbeddingResponse
Gets embedding vectors from the provided image.
Parameters
Name
Description
image
Image
Optional. The image to generate embeddings for. One of image
, video
, or contextual_text
is required.
video
Video
Optional. The video to generate embeddings for. One of image
, video
or contextual_text
is required.
contextual_text
str
Optional. Contextual text for your input image or video. If provided, the model will also generate an embedding vector for the provided contextual text. The returned image and text embedding vectors are in the same semantic space with the same dimensionality, and the vectors can be used interchangeably for use cases like searching image by text or searching text by image. One of image
, video
or contextual_text
is required.
dimension
int
Optional. The number of embedding dimensions. Lower values offer decreased latency when using these embeddings for subsequent tasks, while higher values offer better accuracy. Available values: 128
, 256
, 512
, and 1408
(default).
video_segment_config
VideoSegmentConfig
Optional. The specific video segments (in seconds) the embeddings are generated for.
Returns
Type
Description
MultiModalEmbeddingResponse
The image and text embedding vectors.
Send feedback
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
Need to tell us more?
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[],[],null,["# Class MultiModalEmbeddingModel (1.95.1)\n\nVersion latestkeyboard_arrow_down\n\n- [1.95.1 (latest)](/python/docs/reference/vertexai/latest/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.94.0](/python/docs/reference/vertexai/1.94.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.93.1](/python/docs/reference/vertexai/1.93.1/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.92.0](/python/docs/reference/vertexai/1.92.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.91.0](/python/docs/reference/vertexai/1.91.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.90.0](/python/docs/reference/vertexai/1.90.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.89.0](/python/docs/reference/vertexai/1.89.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.88.0](/python/docs/reference/vertexai/1.88.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.87.0](/python/docs/reference/vertexai/1.87.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.86.0](/python/docs/reference/vertexai/1.86.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.85.0](/python/docs/reference/vertexai/1.85.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.84.0](/python/docs/reference/vertexai/1.84.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.83.0](/python/docs/reference/vertexai/1.83.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.82.0](/python/docs/reference/vertexai/1.82.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.81.0](/python/docs/reference/vertexai/1.81.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.80.0](/python/docs/reference/vertexai/1.80.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.79.0](/python/docs/reference/vertexai/1.79.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.78.0](/python/docs/reference/vertexai/1.78.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.77.0](/python/docs/reference/vertexai/1.77.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.76.0](/python/docs/reference/vertexai/1.76.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.75.0](/python/docs/reference/vertexai/1.75.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.74.0](/python/docs/reference/vertexai/1.74.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.73.0](/python/docs/reference/vertexai/1.73.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.72.0](/python/docs/reference/vertexai/1.72.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.71.1](/python/docs/reference/vertexai/1.71.1/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.70.0](/python/docs/reference/vertexai/1.70.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.69.0](/python/docs/reference/vertexai/1.69.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.68.0](/python/docs/reference/vertexai/1.68.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.67.1](/python/docs/reference/vertexai/1.67.1/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.66.0](/python/docs/reference/vertexai/1.66.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.65.0](/python/docs/reference/vertexai/1.65.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.63.0](/python/docs/reference/vertexai/1.63.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.62.0](/python/docs/reference/vertexai/1.62.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.60.0](/python/docs/reference/vertexai/1.60.0/vertexai.preview.vision_models.MultiModalEmbeddingModel)\n- [1.59.0](/python/docs/reference/vertexai/1.59.0/vertexai.preview.vision_models.MultiModalEmbeddingModel) \n\n MultiModalEmbeddingModel(model_id: str, endpoint_name: typing.Optional[str] = None)\n\nGenerates embedding vectors from images and videos.\n\nExamples:: \n\n model = MultiModalEmbeddingModel.from_pretrained(\"multimodalembedding@001\")\n image = Image.load_from_file(\"image.png\")\n video = Video.load_from_file(\"video.mp4\")\n\n embeddings = model.get_embeddings(\n image=image,\n video=video,\n contextual_text=\"Hello world\",\n )\n image_embedding = embeddings.image_embedding\n video_embeddings = embeddings.video_embeddings\n text_embedding = embeddings.text_embedding\n\nMethods\n-------\n\n### MultiModalEmbeddingModel\n\n MultiModalEmbeddingModel(model_id: str, endpoint_name: typing.Optional[str] = None)\n\nCreates a _ModelGardenModel.\n\nThis constructor should not be called directly.\nUse `{model_class}.from_pretrained(model_name=...)` instead.\n\n### from_pretrained\n\n from_pretrained(model_name: str) -\u003e vertexai._model_garden._model_garden_models.T\n\nLoads a _ModelGardenModel.\n\n### get_embeddings\n\n get_embeddings(\n image: typing.Optional[vertexai.vision_models.Image] = None,\n video: typing.Optional[vertexai.vision_models.Video] = None,\n contextual_text: typing.Optional[str] = None,\n dimension: typing.Optional[int] = None,\n video_segment_config: typing.Optional[\n vertexai.vision_models.VideoSegmentConfig\n ] = None,\n ) -\u003e vertexai.vision_models.MultiModalEmbeddingResponse\n\nGets embedding vectors from the provided image."]]