取得多模態嵌入

多模態嵌入模型會根據您提供的輸入內容 (可包含圖片、文字和影片資料的組合),產生 1408 維向量。接著,這些嵌入向量可用於後續任務,例如圖片分類或影片內容審核。

圖片嵌入向量和文字嵌入向量位於相同的語意空間,且維度相同。因此,這些向量可用於以文字搜尋圖片或以圖片搜尋影片等用途,兩者可互換使用。

針對僅文字嵌入用途,建議改用 Vertex AI 文字嵌入 API。舉例來說,文字嵌入 API 可能較適合用於文字語意搜尋、分群、長篇文件分析,以及其他文字檢索或問答用途。詳情請參閱「取得文字嵌入資料」。

支援的模型

您可以使用下列模型取得多模態嵌入:

  • multimodalembedding

最佳做法

使用多模態嵌入模型時,請考量下列輸入面向:

  • 圖片中的文字:模型可辨識圖片中的文字,類似光學字元辨識 (OCR)。如果您需要區分圖片內容說明和圖片內文字,建議您使用提示工程來指定目標內容。舉例來說,您可以根據用途,指定「貓咪相片」或「文字「貓」」(而非僅僅是「貓」)。




    文字「cat」

    含有「cat」字詞的文字圖片




    貓咪相片

    貓咪圖片
    圖片來源:Manja Vitolic on Unsplash
  • 嵌入相似性:嵌入點積積算法並非經過校正的機率。點積是相似度指標,可能會因用途不同而有不同的分數分布。因此,請避免使用固定值門檻來評估品質。請改用排名方法進行擷取,或使用 S 函數進行分類。

API 使用量

API 上限

使用 multimodalembedding 模型處理文字和圖片嵌入資料時,請遵守下列限制:

限制 值和說明
文字和圖像資料
每個專案每分鐘的 API 要求數量上限 120 到 600 美元,視地區而定
文字長度上限 32 個符記 (~32 個字詞)

文字長度上限為 32 個符記 (約 32 個字詞)。如果輸入超過 32 個符記,模型會在內部將輸入內容縮短至這個長度。
語言 英文
圖片格式 BMP、GIF、JPG、PNG
圖片大小 Base64 編碼圖片:20 MB (轉碼為 PNG 時)
Cloud Storage 圖片:20 MB (原始檔案格式)

圖片大小上限為 20 MB。為避免網路延遲時間增加,請使用較小的圖片。此外,模型會將圖片的解析度調整為 512 x 512 像素。因此,您不需要提供解析度更高的圖片。
影片資料
支援音訊 N/A - 模型在產生影片嵌入資料時不會考量音訊內容
影片格式 AVI、FLV、MKV、MOV、MP4、MPEG、MPG、WEBM 和 WMV
影片長度上限 (Cloud Storage) 沒有限制。不過,一次只能分析 2 分鐘的內容。

事前準備

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

  8. 為環境設定驗證方法。

    Select the tab for how you plan to use the samples on this page:

    Java

    如要在本機開發環境中使用本頁面上的 Java 範例,請先安裝並初始化 gcloud CLI,然後使用您的使用者憑證設定應用程式預設憑證。

    1. Install the Google Cloud CLI.

    2. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    3. To initialize the gcloud CLI, run the following command:

      gcloud init
    4. After initializing the gcloud CLI, update it and install the required components:

      gcloud components update
      gcloud components install beta
    5. If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

      If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    6. 詳情請參閱 Google Cloud 驗證說明文件中的「 為本機開發環境設定 ADC」。

    Node.js

    如要在本機開發環境中使用本頁面上的 Node.js 範例,請先安裝並初始化 gcloud CLI,然後使用您的使用者憑證設定應用程式預設憑證。

    1. Install the Google Cloud CLI.

    2. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    3. To initialize the gcloud CLI, run the following command:

      gcloud init
    4. After initializing the gcloud CLI, update it and install the required components:

      gcloud components update
      gcloud components install beta
    5. If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

      If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    6. 詳情請參閱 Google Cloud 驗證說明文件中的「 為本機開發環境設定 ADC」。

    Python

    如要在本機開發環境中使用本頁面上的 Python 範例,請先安裝並初始化 gcloud CLI,然後使用您的使用者憑證設定應用程式預設憑證。

    1. Install the Google Cloud CLI.

    2. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    3. To initialize the gcloud CLI, run the following command:

      gcloud init
    4. After initializing the gcloud CLI, update it and install the required components:

      gcloud components update
      gcloud components install beta
    5. If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

      If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    6. 詳情請參閱 Google Cloud 驗證說明文件中的「 為本機開發環境設定 ADC」。

    REST

    如要在本機開發環境中使用本頁的 REST API 範例,請使用您提供給 gcloud CLI 的憑證。

      After installing the Google Cloud CLI, initialize it by running the following command:

      gcloud init

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

      After initializing the gcloud CLI, update it and install the required components:

      gcloud components update
      gcloud components install beta

    詳情請參閱 Google Cloud 驗證說明文件中的「驗證以使用 REST」。

  9. 如要使用 Python SDK,請按照「安裝 Vertex AI SDK for Python」一文的操作說明操作。詳情請參閱 Vertex AI SDK for Python API 參考說明文件
  10. (選用步驟) 請查看這項功能的定價。嵌入資料的價格取決於您傳送的資料類型 (例如圖片或文字),也取決於您為特定資料類型使用的模式 (例如 Video Plus、Video Standard 或 Video Essential)。
  11. 位置

    位置是指您可以在要求中指定的區域,用於控管靜態資料的儲存位置。如需可用區域的清單,請參閱「 Vertex AI 生成式 AI 位置」。

    錯誤訊息

    超過配額錯誤

    google.api_core.exceptions.ResourceExhausted: 429 Quota exceeded for
    aiplatform.googleapis.com/online_prediction_requests_per_base_model with base
    model: multimodalembedding. Please submit a quota increase request.
    

    如果這是您第一次收到這項錯誤,請使用 Google Cloud 主控台申請調整專案的配額。請先使用下列篩選器,再提出調整申請:

    • Service ID: aiplatform.googleapis.com
    • metric: aiplatform.googleapis.com/online_prediction_requests_per_base_model
    • base_model:multimodalembedding

    前往「Quotas」(配額) 頁面

    如果您已提出配額調整要求,請等待一段時間再提出另一項要求。如果需要進一步提高配額,請重複提出配額調整要求,並說明持續調整配額的理由。

    指定較低維度的嵌入

    根據預設,嵌入要求會傳回資料類型的 1408 浮點向量。您也可以為文字和圖片資料指定較低維度的嵌入值 (128、256 或 512 個浮點向量)。這個選項可讓您根據嵌入物使用方式,針對延遲時間和儲存空間或品質進行最佳化調整。較低維度的嵌入資料可減少儲存空間需求,並降低後續嵌入工作 (例如搜尋或推薦) 的延遲時間,而較高維度的嵌入資料可為相同工作提供更高的精確度。

    REST

    您可以新增 parameters.dimension 欄位來存取低維度。這個參數可接受下列任一值:1282565121408。回應會包含該維度的嵌入資料。

    使用任何要求資料之前,請先替換以下項目:

    • LOCATION:專案所在的區域。例如 us-central1europe-west2asia-northeast3。如需可用區域的清單,請參閱「Vertex AI 生成式 AI 位置」。
    • PROJECT_ID:您的 Google Cloud 專案 ID
    • IMAGE_URI:要取得嵌入資料的目標圖片的 Cloud Storage URI。例如:gs://my-bucket/embeddings/supermarket-img.png

      您也可以以 Base64 編碼位元組字串的形式提供圖片:

      [...]
      "image": {
        "bytesBase64Encoded": "B64_ENCODED_IMAGE"
      }
      [...]
      
    • TEXT:要取得嵌入資料的目標文字。例如:a cat
    • EMBEDDING_DIMENSION:嵌入維度的數量。值越低,在後續工作中使用這些嵌入資料的延遲時間就會越短,而值越高,準確度就會越高。可用的值:1282565121408 (預設值)。

    HTTP 方法和網址:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

    JSON 要求主體:

    {
      "instances": [
        {
          "image": {
            "gcsUri": "IMAGE_URI"
          },
          "text": "TEXT"
        }
      ],
      "parameters": {
        "dimension": EMBEDDING_DIMENSION
      }
    }
    

    如要傳送要求,請選擇以下其中一個選項:

    curl

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

    PowerShell

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
    模型的嵌入會傳回您指定維度的浮點向量。以下範例回覆經過縮減,以節省空間。

    128 個維度:

    {
      "predictions": [
        {
          "imageEmbedding": [
            0.0279239565,
            [...128 dimension vector...]
            0.00403284049
          ],
          "textEmbedding": [
            0.202921599,
            [...128 dimension vector...]
            -0.0365431122
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }

    256 維度:

    {
      "predictions": [
        {
          "imageEmbedding": [
            0.248620048,
            [...256 dimension vector...]
            -0.0646447465
          ],
          "textEmbedding": [
            0.0757875815,
            [...256 dimension vector...]
            -0.02749932
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }

    512 個維度:

    {
      "predictions": [
        {
          "imageEmbedding": [
            -0.0523675755,
            [...512 dimension vector...]
            -0.0444030389
          ],
          "textEmbedding": [
            -0.0592851527,
            [...512 dimension vector...]
            0.0350437127
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }
    

    Python

    import vertexai
    
    from vertexai.vision_models import Image, MultiModalEmbeddingModel
    
    # TODO(developer): Update & uncomment line below
    # PROJECT_ID = "your-project-id"
    vertexai.init(project=PROJECT_ID, location="us-central1")
    
    # TODO(developer): Try different dimenions: 128, 256, 512, 1408
    embedding_dimension = 128
    
    model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
    image = Image.load_from_file(
        "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
    )
    
    embeddings = model.get_embeddings(
        image=image,
        contextual_text="Colosseum",
        dimension=embedding_dimension,
    )
    
    print(f"Image Embedding: {embeddings.image_embedding}")
    print(f"Text Embedding: {embeddings.text_embedding}")
    
    # Example response:
    # Image Embedding: [0.0622573346, -0.0406507477, 0.0260440577, ...]
    # Text Embedding: [0.27469793, -0.146258667, 0.0222803634, ...]

    Go

    import (
    	"context"
    	"encoding/json"
    	"fmt"
    	"io"
    
    	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
    	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
    	"google.golang.org/api/option"
    	"google.golang.org/protobuf/encoding/protojson"
    	"google.golang.org/protobuf/types/known/structpb"
    )
    
    // generateWithLowerDimension shows how to generate lower-dimensional embeddings for text and image inputs.
    func generateWithLowerDimension(w io.Writer, project, location string) error {
    	// location = "us-central1"
    	ctx := context.Background()
    	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
    	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
    	if err != nil {
    		return fmt.Errorf("failed to construct API client: %w", err)
    	}
    	defer client.Close()
    
    	model := "multimodalembedding@001"
    	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)
    
    	// This is the input to the model's prediction call. For schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
    	instance, err := structpb.NewValue(map[string]any{
    		"image": map[string]any{
    			// Image input can be provided either as a Google Cloud Storage URI or as
    			// base64-encoded bytes using the "bytesBase64Encoded" field.
    			"gcsUri": "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
    		},
    		"text": "Colosseum",
    	})
    	if err != nil {
    		return fmt.Errorf("failed to construct request payload: %w", err)
    	}
    
    	// TODO(developer): Try different dimenions: 128, 256, 512, 1408
    	outputDimensionality := 128
    	params, err := structpb.NewValue(map[string]any{
    		"dimension": outputDimensionality,
    	})
    	if err != nil {
    		return fmt.Errorf("failed to construct request params: %w", err)
    	}
    
    	req := &aiplatformpb.PredictRequest{
    		Endpoint: endpoint,
    		// The model supports only 1 instance per request.
    		Instances:  []*structpb.Value{instance},
    		Parameters: params,
    	}
    
    	resp, err := client.Predict(ctx, req)
    	if err != nil {
    		return fmt.Errorf("failed to generate embeddings: %w", err)
    	}
    
    	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
    	if err != nil {
    		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
    	}
    	// For response schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
    	var instanceEmbeddings struct {
    		ImageEmbeddings []float32 `json:"imageEmbedding"`
    		TextEmbeddings  []float32 `json:"textEmbedding"`
    	}
    	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
    		return fmt.Errorf("failed to unmarshal JSON: %w", err)
    	}
    
    	imageEmbedding := instanceEmbeddings.ImageEmbeddings
    	textEmbedding := instanceEmbeddings.TextEmbeddings
    
    	fmt.Fprintf(w, "Text embedding (length=%d): %v\n", len(textEmbedding), textEmbedding)
    	fmt.Fprintf(w, "Image embedding (length=%d): %v\n", len(imageEmbedding), imageEmbedding)
    	// Example response:
    	// Text Embedding (length=128): [0.27469793 -0.14625867 0.022280363 ... ]
    	// Image Embedding (length=128): [0.06225733 -0.040650766 0.02604402 ... ]
    
    	return nil
    }
    

    傳送嵌入要求 (圖片和文字)

    請使用下列程式碼範例,傳送含有圖片和文字資料的嵌入要求。範例會說明如何傳送含有兩種資料類型的要求,但您也可以使用單一資料類型的服務。

    取得文字和圖片嵌入

    REST

    如要進一步瞭解 multimodalembedding 模型要求,請參閱 multimodalembedding 模型 API 參考資料

    使用任何要求資料之前,請先替換以下項目:

    • LOCATION:專案所在的區域。例如 us-central1europe-west2asia-northeast3。如需可用區域的清單,請參閱「Vertex AI 生成式 AI 位置」。
    • PROJECT_ID:您的 Google Cloud 專案 ID
    • TEXT:要取得嵌入資料的目標文字。例如:a cat
    • B64_ENCODED_IMG:要取得嵌入資料的目標圖片。圖片必須指定為Base64 編碼的位元組字串。

    HTTP 方法和網址:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

    JSON 要求主體:

    {
      "instances": [
        {
          "text": "TEXT",
          "image": {
            "bytesBase64Encoded": "B64_ENCODED_IMG"
          }
        }
      ]
    }
    

    如要傳送要求,請選擇以下其中一個選項:

    curl

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

    PowerShell

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
    模型傳回的嵌入值是 1408 浮點向量。以下範例回應經過縮減,以節省空間。
    {
      "predictions": [
        {
          "textEmbedding": [
            0.010477379,
            -0.00399621,
            0.00576670747,
            [...]
            -0.00823613815,
            -0.0169572588,
            -0.00472954148
          ],
          "imageEmbedding": [
            0.00262696808,
            -0.00198890246,
            0.0152047109,
            -0.0103145819,
            [...]
            0.0324628279,
            0.0284924973,
            0.011650892,
            -0.00452344026
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }
    

    Python 適用的 Vertex AI SDK

    如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK,請參閱「安裝 Python 適用的 Vertex AI SDK」。 詳情請參閱 Vertex AI SDK for Python API 參考說明文件

    import vertexai
    from vertexai.vision_models import Image, MultiModalEmbeddingModel
    
    # TODO(developer): Update & uncomment line below
    # PROJECT_ID = "your-project-id"
    vertexai.init(project=PROJECT_ID, location="us-central1")
    
    model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
    image = Image.load_from_file(
        "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
    )
    
    embeddings = model.get_embeddings(
        image=image,
        contextual_text="Colosseum",
        dimension=1408,
    )
    print(f"Image Embedding: {embeddings.image_embedding}")
    print(f"Text Embedding: {embeddings.text_embedding}")
    # Example response:
    # Image Embedding: [-0.0123147098, 0.0727171078, ...]
    # Text Embedding: [0.00230263756, 0.0278981831, ...]
    

    Node.js

    在試用這個範例之前,請先按照 Vertex AI 快速入門:使用用戶端程式庫中的操作說明設定 Node.js。詳情請參閱 Vertex AI Node.js API 參考說明文件

    如要向 Vertex AI 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

    /**
     * TODO(developer): Uncomment these variables before running the sample.\
     * (Not necessary if passing values as arguments)
     */
    // const project = 'YOUR_PROJECT_ID';
    // const location = 'YOUR_PROJECT_LOCATION';
    // const baseImagePath = 'YOUR_BASE_IMAGE_PATH';
    // const textPrompt = 'YOUR_TEXT_PROMPT';
    const aiplatform = require('@google-cloud/aiplatform');
    
    // Imports the Google Cloud Prediction service client
    const {PredictionServiceClient} = aiplatform.v1;
    
    // Import the helper module for converting arbitrary protobuf.Value objects.
    const {helpers} = aiplatform;
    
    // Specifies the location of the api endpoint
    const clientOptions = {
      apiEndpoint: 'us-central1-aiplatform.googleapis.com',
    };
    const publisher = 'google';
    const model = 'multimodalembedding@001';
    
    // Instantiates a client
    const predictionServiceClient = new PredictionServiceClient(clientOptions);
    
    async function predictImageFromImageAndText() {
      // Configure the parent resource
      const endpoint = `projects/${project}/locations/${location}/publishers/${publisher}/models/${model}`;
    
      const fs = require('fs');
      const imageFile = fs.readFileSync(baseImagePath);
    
      // Convert the image data to a Buffer and base64 encode it.
      const encodedImage = Buffer.from(imageFile).toString('base64');
    
      const prompt = {
        text: textPrompt,
        image: {
          bytesBase64Encoded: encodedImage,
        },
      };
      const instanceValue = helpers.toValue(prompt);
      const instances = [instanceValue];
    
      const parameter = {
        sampleCount: 1,
      };
      const parameters = helpers.toValue(parameter);
    
      const request = {
        endpoint,
        instances,
        parameters,
      };
    
      // Predict request
      const [response] = await predictionServiceClient.predict(request);
      console.log('Get image embedding response');
      const predictions = response.predictions;
      console.log('\tPredictions :');
      for (const prediction of predictions) {
        console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
      }
    }
    
    await predictImageFromImageAndText();

    Java

    在試用這個範例之前,請先按照 Vertex AI 快速入門:使用用戶端程式庫中的操作說明設定 Java。詳情請參閱 Vertex AI Java API 參考說明文件

    如要向 Vertex AI 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

    
    import com.google.cloud.aiplatform.v1beta1.EndpointName;
    import com.google.cloud.aiplatform.v1beta1.PredictResponse;
    import com.google.cloud.aiplatform.v1beta1.PredictionServiceClient;
    import com.google.cloud.aiplatform.v1beta1.PredictionServiceSettings;
    import com.google.gson.Gson;
    import com.google.gson.JsonObject;
    import com.google.protobuf.InvalidProtocolBufferException;
    import com.google.protobuf.Value;
    import com.google.protobuf.util.JsonFormat;
    import java.io.IOException;
    import java.nio.charset.StandardCharsets;
    import java.nio.file.Files;
    import java.nio.file.Paths;
    import java.util.ArrayList;
    import java.util.Base64;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    public class PredictImageFromImageAndTextSample {
    
      public static void main(String[] args) throws IOException {
        // TODO(developer): Replace this variable before running the sample.
        String project = "YOUR_PROJECT_ID";
        String textPrompt = "YOUR_TEXT_PROMPT";
        String baseImagePath = "YOUR_BASE_IMAGE_PATH";
    
        // Learn how to use text prompts to update an image:
        // https://cloud.google.com/vertex-ai/docs/generative-ai/image/edit-images
        Map<String, Object> parameters = new HashMap<String, Object>();
        parameters.put("sampleCount", 1);
    
        String location = "us-central1";
        String publisher = "google";
        String model = "multimodalembedding@001";
    
        predictImageFromImageAndText(
            project, location, publisher, model, textPrompt, baseImagePath, parameters);
      }
    
      // Update images using text prompts
      public static void predictImageFromImageAndText(
          String project,
          String location,
          String publisher,
          String model,
          String textPrompt,
          String baseImagePath,
          Map<String, Object> parameters)
          throws IOException {
        final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
        final PredictionServiceSettings predictionServiceSettings =
            PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();
    
        // Initialize client that will be used to send requests. This client only needs to be created
        // once, and can be reused for multiple requests.
        try (PredictionServiceClient predictionServiceClient =
            PredictionServiceClient.create(predictionServiceSettings)) {
          final EndpointName endpointName =
              EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);
    
          // Convert the image to Base64
          byte[] imageData = Base64.getEncoder().encode(Files.readAllBytes(Paths.get(baseImagePath)));
          String encodedImage = new String(imageData, StandardCharsets.UTF_8);
    
          JsonObject jsonInstance = new JsonObject();
          jsonInstance.addProperty("text", textPrompt);
          JsonObject jsonImage = new JsonObject();
          jsonImage.addProperty("bytesBase64Encoded", encodedImage);
          jsonInstance.add("image", jsonImage);
    
          Value instanceValue = stringToValue(jsonInstance.toString());
          List<Value> instances = new ArrayList<>();
          instances.add(instanceValue);
    
          Gson gson = new Gson();
          String gsonString = gson.toJson(parameters);
          Value parameterValue = stringToValue(gsonString);
    
          PredictResponse predictResponse =
              predictionServiceClient.predict(endpointName, instances, parameterValue);
          System.out.println("Predict Response");
          System.out.println(predictResponse);
          for (Value prediction : predictResponse.getPredictionsList()) {
            System.out.format("\tPrediction: %s\n", prediction);
          }
        }
      }
    
      // Convert a Json string to a protobuf.Value
      static Value stringToValue(String value) throws InvalidProtocolBufferException {
        Value.Builder builder = Value.newBuilder();
        JsonFormat.parser().merge(value, builder);
        return builder.build();
      }
    }

    Go

    在試用這個範例之前,請先按照 Vertex AI 快速入門:使用用戶端程式庫中的操作說明設定 Go。詳情請參閱 Vertex AI Go API 參考說明文件

    如要向 Vertex AI 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

    import (
    	"context"
    	"encoding/json"
    	"fmt"
    	"io"
    
    	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
    	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
    	"google.golang.org/api/option"
    	"google.golang.org/protobuf/encoding/protojson"
    	"google.golang.org/protobuf/types/known/structpb"
    )
    
    // generateForTextAndImage shows how to use the multimodal model to generate embeddings for
    // text and image inputs.
    func generateForTextAndImage(w io.Writer, project, location string) error {
    	// location = "us-central1"
    	ctx := context.Background()
    	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
    	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
    	if err != nil {
    		return fmt.Errorf("failed to construct API client: %w", err)
    	}
    	defer client.Close()
    
    	model := "multimodalembedding@001"
    	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)
    
    	// This is the input to the model's prediction call. For schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
    	instance, err := structpb.NewValue(map[string]any{
    		"image": map[string]any{
    			// Image input can be provided either as a Google Cloud Storage URI or as
    			// base64-encoded bytes using the "bytesBase64Encoded" field.
    			"gcsUri": "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
    		},
    		"text": "Colosseum",
    	})
    	if err != nil {
    		return fmt.Errorf("failed to construct request payload: %w", err)
    	}
    
    	req := &aiplatformpb.PredictRequest{
    		Endpoint: endpoint,
    		// The model supports only 1 instance per request.
    		Instances: []*structpb.Value{instance},
    	}
    
    	resp, err := client.Predict(ctx, req)
    	if err != nil {
    		return fmt.Errorf("failed to generate embeddings: %w", err)
    	}
    
    	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
    	if err != nil {
    		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
    	}
    	// For response schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
    	var instanceEmbeddings struct {
    		ImageEmbeddings []float32 `json:"imageEmbedding"`
    		TextEmbeddings  []float32 `json:"textEmbedding"`
    	}
    	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
    		return fmt.Errorf("failed to unmarshal JSON: %w", err)
    	}
    
    	imageEmbedding := instanceEmbeddings.ImageEmbeddings
    	textEmbedding := instanceEmbeddings.TextEmbeddings
    
    	fmt.Fprintf(w, "Text embedding (length=%d): %v\n", len(textEmbedding), textEmbedding)
    	fmt.Fprintf(w, "Image embedding (length=%d): %v\n", len(imageEmbedding), imageEmbedding)
    	// Example response:
    	// Text embedding (length=1408): [0.0023026613 0.027898183 -0.011858357 ... ]
    	// Image embedding (length=1408): [-0.012314269 0.07271844 0.00020170923 ... ]
    
    	return nil
    }
    

    傳送嵌入要求 (影片、圖片或文字)

    傳送嵌入要求時,您可以單獨指定輸入影片,也可以指定影片、圖片和文字資料的組合。

    影片嵌入模式

    您可以使用三種模式嵌入影片:Essential、Standard 或 Plus。模式對應產生的嵌入資料密度,可透過要求中的 interval_sec 設定指定。系統會針對每個長度為 interval_sec 的影片間隔產生一個嵌入項目。影片間隔最短長度為 4 秒。間隔長度超過 120 秒,可能會對產生的嵌入資料品質造成負面影響。

    影片嵌入功能的價格取決於您使用的模式。詳情請參閱定價

    下表列出可用於嵌入影片的三種模式:

    模式 每分鐘的嵌入數量上限 影片嵌入間隔 (最小值)
    基本服務 4 15

    這會對應至:intervalSec >= 15
    標準 8 8

    對應於:8 <= intervalSec < 15
    Plus 15 4

    這會對應至:4 <= intervalSec < 8

    影片嵌入最佳做法

    傳送影片嵌入要求時,請考量下列事項:

    • 如要針對任何長度的輸入影片,產生前兩分鐘的單一嵌入內容,請使用下列 videoSegmentConfig 設定:

      request.json

      // other request body content
      "videoSegmentConfig": {
        "intervalSec": 120
      }
      // other request body content
      
    • 如要為長度超過兩分鐘的影片產生嵌入內容,您可以傳送多項要求,在 videoSegmentConfig 中指定開始和結束時間:

      request1.json

      // other request body content
      "videoSegmentConfig": {
        "startOffsetSec": 0,
        "endOffsetSec": 120
      }
      // other request body content
      

      request2.json

      // other request body content
      "videoSegmentConfig": {
        "startOffsetSec": 120,
        "endOffsetSec": 240
      }
      // other request body content
      

    取得影片嵌入

    請使用以下範例,取得單一影片內容的嵌入資料。

    REST

    如要進一步瞭解 multimodalembedding 模型要求,請參閱 multimodalembedding 模型 API 參考資料

    以下範例使用位於 Cloud Storage 中的影片。您也可以使用 video.bytesBase64Encoded 欄位,提供Base64 編碼的影片字串表示法。

    使用任何要求資料之前,請先替換以下項目:

    • LOCATION:專案所在的區域。例如 us-central1europe-west2asia-northeast3。如需可用區域的清單,請參閱「Vertex AI 生成式 AI 位置」。
    • PROJECT_ID:您的 Google Cloud 專案 ID
    • VIDEO_URI:要取得嵌入資料的目標影片的 Cloud Storage URI。例如:gs://my-bucket/embeddings/supermarket-video.mp4

      您也可以以 Base64 編碼位元組字串的形式提供影片:

      [...]
      "video": {
        "bytesBase64Encoded": "B64_ENCODED_VIDEO"
      }
      [...]
      
    • videoSegmentConfig (START_SECONDEND_SECONDINTERVAL_SECONDS)。選用。要產生嵌入資料的特定影片片段 (以秒為單位)。

      例如:

      [...]
      "videoSegmentConfig": {
        "startOffsetSec": 10,
        "endOffsetSec": 60,
        "intervalSec": 10
      }
      [...]

      使用這個設定可指定 10 秒到 60 秒的影片資料,並為下列 10 秒的影片間隔產生嵌入:[10, 20)、[20, 30)、[30, 40)、[40, 50)、[50, 60)。這個影片間隔 ("intervalSec": 10) 屬於標準影片嵌入模式,因此會以標準模式的定價費率向使用者收費。

      如果省略 videoSegmentConfig,服務會使用下列預設值:"videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }。這個影片間隔 ("intervalSec": 16) 屬於Essential 影片嵌入模式,因此會以Essential 模式的定價費率向使用者收費。

    HTTP 方法和網址:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

    JSON 要求主體:

    {
      "instances": [
        {
          "video": {
            "gcsUri": "VIDEO_URI",
            "videoSegmentConfig": {
              "startOffsetSec": START_SECOND,
              "endOffsetSec": END_SECOND,
              "intervalSec": INTERVAL_SECONDS
            }
          }
        }
      ]
    }
    

    如要傳送要求,請選擇以下其中一個選項:

    curl

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

    PowerShell

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
    模型傳回的嵌入值是 1408 浮點向量。以下範例回應經過縮減,以節省空間。

    回覆 (7 秒影片,未指定 videoSegmentConfig):

    {
      "predictions": [
        {
          "videoEmbeddings": [
            {
              "endOffsetSec": 7,
              "embedding": [
                -0.0045467657,
                0.0258095954,
                0.0146885719,
                0.00945400633,
                [...]
                -0.0023291884,
                -0.00493789,
                0.00975185353,
                0.0168156829
              ],
              "startOffsetSec": 0
            }
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }

    回應 (59 秒影片,影片片段設定為 "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 60, "intervalSec": 10 }):

    {
      "predictions": [
        {
          "videoEmbeddings": [
            {
              "endOffsetSec": 10,
              "startOffsetSec": 0,
              "embedding": [
                -0.00683252793,
                0.0390476175,
                [...]
                0.00657121744,
                0.013023301
              ]
            },
            {
              "startOffsetSec": 10,
              "endOffsetSec": 20,
              "embedding": [
                -0.0104404651,
                0.0357737206,
                [...]
                0.00509833824,
                0.0131902946
              ]
            },
            {
              "startOffsetSec": 20,
              "embedding": [
                -0.0113538112,
                0.0305239167,
                [...]
                -0.00195809244,
                0.00941874553
              ],
              "endOffsetSec": 30
            },
            {
              "embedding": [
                -0.00299320649,
                0.0322436653,
                [...]
                -0.00993082579,
                0.00968887936
              ],
              "startOffsetSec": 30,
              "endOffsetSec": 40
            },
            {
              "endOffsetSec": 50,
              "startOffsetSec": 40,
              "embedding": [
                -0.00591270532,
                0.0368893594,
                [...]
                -0.00219071587,
                0.0042470959
              ]
            },
            {
              "embedding": [
                -0.00458270218,
                0.0368121453,
                [...]
                -0.00317760976,
                0.00595594104
              ],
              "endOffsetSec": 59,
              "startOffsetSec": 50
            }
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }
    

    Python 適用的 Vertex AI SDK

    如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK,請參閱「安裝 Python 適用的 Vertex AI SDK」。 詳情請參閱 Vertex AI SDK for Python API 參考說明文件

    import vertexai
    
    from vertexai.vision_models import MultiModalEmbeddingModel, Video
    from vertexai.vision_models import VideoSegmentConfig
    
    # TODO(developer): Update & uncomment line below
    # PROJECT_ID = "your-project-id"
    vertexai.init(project=PROJECT_ID, location="us-central1")
    
    model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
    
    embeddings = model.get_embeddings(
        video=Video.load_from_file(
            "gs://cloud-samples-data/vertex-ai-vision/highway_vehicles.mp4"
        ),
        video_segment_config=VideoSegmentConfig(end_offset_sec=1),
    )
    
    # Video Embeddings are segmented based on the video_segment_config.
    print("Video Embeddings:")
    for video_embedding in embeddings.video_embeddings:
        print(
            f"Video Segment: {video_embedding.start_offset_sec} - {video_embedding.end_offset_sec}"
        )
        print(f"Embedding: {video_embedding.embedding}")
    
    # Example response:
    # Video Embeddings:
    # Video Segment: 0.0 - 1.0
    # Embedding: [-0.0206376351, 0.0123456789, ...]
    

    Go

    在試用這個範例之前,請先按照 Vertex AI 快速入門:使用用戶端程式庫中的操作說明設定 Go。詳情請參閱 Vertex AI Go API 參考說明文件

    如要向 Vertex AI 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

    import (
    	"context"
    	"encoding/json"
    	"fmt"
    	"io"
    	"time"
    
    	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
    	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
    	"google.golang.org/api/option"
    	"google.golang.org/protobuf/encoding/protojson"
    	"google.golang.org/protobuf/types/known/structpb"
    )
    
    // generateForVideo shows how to use the multimodal model to generate embeddings for video input.
    func generateForVideo(w io.Writer, project, location string) error {
    	// location = "us-central1"
    
    	// The default context timeout may be not enough to process a video input.
    	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
    	defer cancel()
    
    	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
    	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
    	if err != nil {
    		return fmt.Errorf("failed to construct API client: %w", err)
    	}
    	defer client.Close()
    
    	model := "multimodalembedding@001"
    	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)
    
    	// This is the input to the model's prediction call. For schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
    	instances, err := structpb.NewValue(map[string]any{
    		"video": map[string]any{
    			// Video input can be provided either as a Google Cloud Storage URI or as base64-encoded
    			// bytes using the "bytesBase64Encoded" field.
    			"gcsUri": "gs://cloud-samples-data/vertex-ai-vision/highway_vehicles.mp4",
    			"videoSegmentConfig": map[string]any{
    				"startOffsetSec": 1,
    				"endOffsetSec":   5,
    			},
    		},
    	})
    	if err != nil {
    		return fmt.Errorf("failed to construct request payload: %w", err)
    	}
    
    	req := &aiplatformpb.PredictRequest{
    		Endpoint: endpoint,
    		// The model supports only 1 instance per request.
    		Instances: []*structpb.Value{instances},
    	}
    	resp, err := client.Predict(ctx, req)
    	if err != nil {
    		return fmt.Errorf("failed to generate embeddings: %w", err)
    	}
    
    	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
    	if err != nil {
    		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
    	}
    	// For response schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
    	var instanceEmbeddings struct {
    		VideoEmbeddings []struct {
    			Embedding      []float32 `json:"embedding"`
    			StartOffsetSec float64   `json:"startOffsetSec"`
    			EndOffsetSec   float64   `json:"endOffsetSec"`
    		} `json:"videoEmbeddings"`
    	}
    	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
    		return fmt.Errorf("failed to unmarshal json: %w", err)
    	}
    	// Get the embedding for our single video segment (`.videoEmbeddings` object has one entry per
    	// each processed segment).
    	videoEmbedding := instanceEmbeddings.VideoEmbeddings[0]
    
    	fmt.Fprintf(w, "Video embedding (seconds: %.f-%.f; length=%d): %v\n",
    		videoEmbedding.StartOffsetSec,
    		videoEmbedding.EndOffsetSec,
    		len(videoEmbedding.Embedding),
    		videoEmbedding.Embedding,
    	)
    	// Example response:
    	// Video embedding (seconds: 1-5; length=1408): [-0.016427778 0.032878537 -0.030755188 ... ]
    
    	return nil
    }
    

    取得圖片、文字和影片嵌入

    使用下列範例,取得影片、文字和圖片內容的嵌入資料。

    REST

    如要進一步瞭解 multimodalembedding 模型要求,請參閱 multimodalembedding 模型 API 參考資料

    以下範例使用圖片、文字和影片資料。您可以在要求主體中任意搭配使用這些資料類型。

    此外,此範例會使用位於 Cloud Storage 中的影片。您也可以使用 video.bytesBase64Encoded 欄位,提供Base64 編碼的影片字串表示法。

    使用任何要求資料之前,請先替換以下項目:

    • LOCATION:專案所在的區域。例如 us-central1europe-west2asia-northeast3。如需可用區域的清單,請參閱「Vertex AI 生成式 AI 位置」。
    • PROJECT_ID:您的 Google Cloud 專案 ID
    • TEXT:要取得嵌入資料的目標文字。例如:a cat
    • IMAGE_URI:要取得嵌入資料的目標圖片的 Cloud Storage URI。例如:gs://my-bucket/embeddings/supermarket-img.png

      您也可以以 Base64 編碼位元組字串的形式提供圖片:

      [...]
      "image": {
        "bytesBase64Encoded": "B64_ENCODED_IMAGE"
      }
      [...]
      
    • VIDEO_URI:要取得嵌入資料的目標影片的 Cloud Storage URI。例如:gs://my-bucket/embeddings/supermarket-video.mp4

      您也可以以 Base64 編碼位元組字串的形式提供影片:

      [...]
      "video": {
        "bytesBase64Encoded": "B64_ENCODED_VIDEO"
      }
      [...]
      
    • videoSegmentConfig (START_SECONDEND_SECONDINTERVAL_SECONDS)。選用。要產生嵌入資料的特定影片片段 (以秒為單位)。

      例如:

      [...]
      "videoSegmentConfig": {
        "startOffsetSec": 10,
        "endOffsetSec": 60,
        "intervalSec": 10
      }
      [...]

      使用這個設定可指定 10 秒到 60 秒的影片資料,並為下列 10 秒的影片間隔產生嵌入:[10, 20)、[20, 30)、[30, 40)、[40, 50)、[50, 60)。這個影片間隔 ("intervalSec": 10) 屬於標準影片嵌入模式,因此會以標準模式的定價費率向使用者收費。

      如果省略 videoSegmentConfig,服務會使用下列預設值:"videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }。這個影片間隔 ("intervalSec": 16) 屬於Essential 影片嵌入模式,因此會以Essential 模式的定價費率向使用者收費。

    HTTP 方法和網址:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict

    JSON 要求主體:

    {
      "instances": [
        {
          "text": "TEXT",
          "image": {
            "gcsUri": "IMAGE_URI"
          },
          "video": {
            "gcsUri": "VIDEO_URI",
            "videoSegmentConfig": {
              "startOffsetSec": START_SECOND,
              "endOffsetSec": END_SECOND,
              "intervalSec": INTERVAL_SECONDS
            }
          }
        }
      ]
    }
    

    如要傳送要求,請選擇以下其中一個選項:

    curl

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"

    PowerShell

    將要求主體儲存在名為 request.json 的檔案中,然後執行下列指令:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
    模型傳回的嵌入值是 1408 浮點向量。以下範例回應經過縮減,以節省空間。
    {
      "predictions": [
        {
          "textEmbedding": [
            0.0105433334,
            -0.00302835181,
            0.00656806398,
            0.00603460241,
            [...]
            0.00445805816,
            0.0139605571,
            -0.00170318608,
            -0.00490092579
          ],
          "videoEmbeddings": [
            {
              "startOffsetSec": 0,
              "endOffsetSec": 7,
              "embedding": [
                -0.00673126569,
                0.0248149596,
                0.0128901172,
                0.0107588246,
                [...]
                -0.00180952181,
                -0.0054573305,
                0.0117037306,
                0.0169312079
              ]
            }
          ],
          "imageEmbedding": [
            -0.00728622358,
            0.031021487,
            -0.00206603738,
            0.0273937676,
            [...]
            -0.00204976718,
            0.00321615417,
            0.0121978866,
            0.0193375275
          ]
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID"
    }
    

    Python 適用的 Vertex AI SDK

    如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK,請參閱「安裝 Python 適用的 Vertex AI SDK」。 詳情請參閱 Vertex AI SDK for Python API 參考說明文件

    import vertexai
    
    from vertexai.vision_models import Image, MultiModalEmbeddingModel, Video
    from vertexai.vision_models import VideoSegmentConfig
    
    # TODO(developer): Update & uncomment line below
    # PROJECT_ID = "your-project-id"
    vertexai.init(project=PROJECT_ID, location="us-central1")
    
    model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
    
    image = Image.load_from_file(
        "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
    )
    video = Video.load_from_file(
        "gs://cloud-samples-data/vertex-ai-vision/highway_vehicles.mp4"
    )
    
    embeddings = model.get_embeddings(
        image=image,
        video=video,
        video_segment_config=VideoSegmentConfig(end_offset_sec=1),
        contextual_text="Cars on Highway",
    )
    
    print(f"Image Embedding: {embeddings.image_embedding}")
    
    # Video Embeddings are segmented based on the video_segment_config.
    print("Video Embeddings:")
    for video_embedding in embeddings.video_embeddings:
        print(
            f"Video Segment: {video_embedding.start_offset_sec} - {video_embedding.end_offset_sec}"
        )
        print(f"Embedding: {video_embedding.embedding}")
    
    print(f"Text Embedding: {embeddings.text_embedding}")
    # Example response:
    # Image Embedding: [-0.0123144267, 0.0727186054, 0.000201397663, ...]
    # Video Embeddings:
    # Video Segment: 0.0 - 1.0
    # Embedding: [-0.0206376351, 0.0345234685, ...]
    # Text Embedding: [-0.0207006838, -0.00251058186, ...]
    

    Go

    在試用這個範例之前,請先按照 Vertex AI 快速入門:使用用戶端程式庫中的操作說明設定 Go。詳情請參閱 Vertex AI Go API 參考說明文件

    如要向 Vertex AI 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

    import (
    	"context"
    	"encoding/json"
    	"fmt"
    	"io"
    	"time"
    
    	aiplatform "cloud.google.com/go/aiplatform/apiv1beta1"
    	aiplatformpb "cloud.google.com/go/aiplatform/apiv1beta1/aiplatformpb"
    	"google.golang.org/api/option"
    	"google.golang.org/protobuf/encoding/protojson"
    	"google.golang.org/protobuf/types/known/structpb"
    )
    
    // generateForImageTextAndVideo shows how to use the multimodal model to generate embeddings for
    // image, text and video data.
    func generateForImageTextAndVideo(w io.Writer, project, location string) error {
    	// location = "us-central1"
    
    	// The default context timeout may be not enough to process a video input.
    	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
    	defer cancel()
    
    	apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
    	client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
    	if err != nil {
    		return fmt.Errorf("failed to construct API client: %w", err)
    	}
    	defer client.Close()
    
    	model := "multimodalembedding@001"
    	endpoint := fmt.Sprintf("projects/%s/locations/%s/publishers/google/models/%s", project, location, model)
    
    	// This is the input to the model's prediction call. For schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#request_body
    	instance, err := structpb.NewValue(map[string]any{
    		"text": "Domestic cats in natural conditions",
    		"image": map[string]any{
    			// Image and video inputs can be provided either as a Google Cloud Storage URI or as
    			// base64-encoded bytes using the "bytesBase64Encoded" field.
    			"gcsUri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg",
    		},
    		"video": map[string]any{
    			"gcsUri": "gs://cloud-samples-data/video/cat.mp4",
    		},
    	})
    	if err != nil {
    		return fmt.Errorf("failed to construct request payload: %w", err)
    	}
    
    	req := &aiplatformpb.PredictRequest{
    		Endpoint: endpoint,
    		// The model supports only 1 instance per request.
    		Instances: []*structpb.Value{instance},
    	}
    
    	resp, err := client.Predict(ctx, req)
    	if err != nil {
    		return fmt.Errorf("failed to generate embeddings: %w", err)
    	}
    
    	instanceEmbeddingsJson, err := protojson.Marshal(resp.GetPredictions()[0])
    	if err != nil {
    		return fmt.Errorf("failed to convert protobuf value to JSON: %w", err)
    	}
    	// For response schema, see:
    	// https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings-api#response-body
    	var instanceEmbeddings struct {
    		ImageEmbeddings []float32 `json:"imageEmbedding"`
    		TextEmbeddings  []float32 `json:"textEmbedding"`
    		VideoEmbeddings []struct {
    			Embedding      []float32 `json:"embedding"`
    			StartOffsetSec float64   `json:"startOffsetSec"`
    			EndOffsetSec   float64   `json:"endOffsetSec"`
    		} `json:"videoEmbeddings"`
    	}
    	if err := json.Unmarshal(instanceEmbeddingsJson, &instanceEmbeddings); err != nil {
    		return fmt.Errorf("failed to unmarshal JSON: %w", err)
    	}
    
    	imageEmbedding := instanceEmbeddings.ImageEmbeddings
    	textEmbedding := instanceEmbeddings.TextEmbeddings
    	// Get the embedding for our single video segment (`.videoEmbeddings` object has one entry per
    	// each processed segment).
    	videoEmbedding := instanceEmbeddings.VideoEmbeddings[0].Embedding
    
    	fmt.Fprintf(w, "Image embedding (length=%d): %v\n", len(imageEmbedding), imageEmbedding)
    	fmt.Fprintf(w, "Text embedding (length=%d): %v\n", len(textEmbedding), textEmbedding)
    	fmt.Fprintf(w, "Video embedding (length=%d): %v\n", len(videoEmbedding), videoEmbedding)
    	// Example response:
    	// Image embedding (length=1408): [-0.01558477 0.0258355 0.016342038 ... ]
    	// Text embedding (length=1408): [-0.005894961 0.008349559 0.015355394 ... ]
    	// Video embedding (length=1408): [-0.018867437 0.013997682 0.0012682161 ... ]
    
    	return nil
    }
    

    後續步驟