自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

使用圖像說明生成功能產生圖像說明文字

注意：自 2025 年 6 月 24 日起，Imagen 第 1 版和第 2 版將淘汰。Imagen 模型 imagegeneration@002、imagegeneration@005 和 imagegeneration@006 將於 2025 年 9 月 24 日移除。如要進一步瞭解如何遷移至 Imagen 3，請參閱「遷移至 Imagen 3」。

圖片說明功能可為圖片生成相關說明。您可以將這項資訊用於各種用途：

取得圖片的詳細中繼資料，以便儲存及搜尋。
產生自動字幕，支援無障礙用途。
取得產品和視覺素材資源的簡要說明。

圖片來源：Santhosh Kumar (Unsplash) (已裁剪)

圖片說明 (簡短)：藍色襯衫，上面有白色圓點，掛在掛鉤上

支援的語言

視覺字幕支援下列語言：

英文 (en)
法文 (fr)
德文 (de)
義大利文 (it)
西班牙文 (es)

效能和限制

使用這個模型時，請遵守下列限制：

限制	值
每項專案每分鐘的 API 要求數上限 (簡短形式)	500
回覆中傳回的詞元數量上限 (簡短形式)	64 個權杖
要求中接受的權杖數量上限 (僅限 VQA 短格式)	80 個符記

使用這個模型時，適用下列服務延遲時間預估值。這些值僅供說明，並非服務承諾：

延遲時間	值
API 要求 (簡短形式)	1.5 秒

位置

位置是您可以在要求中指定的區域，用來控管靜態資料的儲存位置。如需可用區域的清單，請參閱「 Vertex AI 的生成式 AI 服務地區」。

負責任的 AI 技術安全篩選

圖像說明和圖像問題回答 (VQA) 功能模型不支援使用者可設定的安全過濾器。不過，Imagen 整體安全過濾機制會針對下列資料進行過濾：

使用者輸入內容
模型輸出

因此，如果 Imagen 套用這些安全篩選器，輸出內容可能會與範例輸出內容不同。請參考以下例子。

篩選後的輸入內容

如果輸入內容經過篩選，回應會類似以下內容：

{
  "error": {
    "code": 400,
    "message": "Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.DebugInfo",
        "detail": "[ORIGINAL ERROR] generic::invalid_argument: Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394 [google.rpc.error_details_ext] { message: \"Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394\" }"
      }
    ]
  }
}

篩選後的輸出內容

如果傳回的回應數量少於您指定的樣本數，表示缺少的回應已由 Responsible AI 篩除。舉例來說，以下是含有 "sampleCount": 2 的要求的回應，但其中一個回應遭到篩除：

{
  "predictions": [
    "cappuccino"
  ]
}

如果所有輸出內容都經過篩選，回應會是類似下列內容的空白物件：

{}

取得短篇圖像說明文字

使用下列範例生成圖像的短篇說明文字。

REST

如要進一步瞭解 imagetext 模型要求，請參閱 imagetext 模型 API 參考資料。

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的 Google Cloud 專案 ID。
LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
B64_IMAGE：要取得說明文字的圖片。圖片必須指定為 Base64 編碼的位元組字串。大小限制： 10 MB。
RESPONSE_COUNT：要生成的圖片說明數量。接受的整數值：1 到 3。
LANGUAGE_CODE：支援的語言代碼之一。支援的語言：
- 英文 (en)
- 法文 (fr)
- 德文 (de)
- 義大利文 (it)
- 西班牙文 (es)

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

JSON 要求主體：

{
  "instances": [
    {
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT,
    "language": "LANGUAGE_CODE"
  }
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下範例回應適用於含有 "sampleCount": 2 的要求。回應會傳回兩個預測字串。

英文 (en)：

{
  "predictions": [
    "a yellow mug with a sheep on it sits next to a slice of cake",
    "a cup of coffee with a heart shaped latte art next to a slice of cake"
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID",
  "model": "projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
  "modelDisplayName": "MODEL_DISPLAYNAME",
  "modelVersionId": "1"
}

西班牙文 (es)：

{
  "predictions": [
    "una taza de café junto a un plato de pastel de chocolate",
    "una taza de café con una forma de corazón en la espuma"
  ]
}

Python

在試用這個範例之前，請先按照Python使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Python API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

在本範例中，您會使用 load_from_file 方法參照本機檔案做為基本 Image，以取得說明文字。指定基礎圖片後，請對 ImageTextModel 使用 get_captions 方法，然後列印輸出內容。


import vertexai
from vertexai.preview.vision_models import Image, ImageTextModel

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# input_file = "input-image.png"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = ImageTextModel.from_pretrained("imagetext@001")
source_img = Image.load_from_file(location=input_file)

captions = model.get_captions(
    image=source_img,
    # Optional parameters
    language="en",
    number_of_results=2,
)

print(captions)
# Example response:
# ['a cat with green eyes looks up at the sky']

Node.js

在試用這個範例之前，請先按照Node.js使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Node.js API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

在這個範例中，您會在 PredictionServiceClient 上呼叫 predict 方法。服務會傳回所提供圖片的說明。

/**
 * TODO(developer): Update these variables before running the sample.
 */
const projectId = process.env.CAIP_PROJECT_ID;
const location = 'us-central1';
const inputFile = 'resources/cat.png';

const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction Service Client library
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: `${location}-aiplatform.googleapis.com`,
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function getShortFormImageCaptions() {
  const fs = require('fs');
  // Configure the parent resource
  const endpoint = `projects/${projectId}/locations/${location}/publishers/google/models/imagetext@001`;

  const imageFile = fs.readFileSync(inputFile);
  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const instance = {
    image: {
      bytesBase64Encoded: encodedImage,
    },
  };
  const instanceValue = helpers.toValue(instance);
  const instances = [instanceValue];

  const parameter = {
    // Optional parameters
    language: 'en',
    sampleCount: 2,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  const predictions = response.predictions;
  if (predictions.length === 0) {
    console.log(
      'No captions were generated. Check the request parameters and image.'
    );
  } else {
    predictions.forEach(prediction => {
      console.log(prediction.stringValue);
    });
  }
}
await getShortFormImageCaptions();

使用圖像說明參數

取得圖片說明時，您可以根據用途設定多個參數。

結果數量

使用結果數量參數，限制每次要求傳回的字幕數量。詳情請參閱imagetext (圖片說明) 模型 API 參考資料。

種子數

您在要求中加入的數字，可讓系統產生確定性的說明。在要求中加入種子號碼，可確保每次都獲得相同的預測結果 (說明)。不過，圖片說明不一定會以相同順序傳回。詳情請參閱imagetext (圖片說明) 模型 API 參考資料。

後續步驟

閱讀有關 Imagen 和其他 Vertex AI 生成式 AI 產品的文章：

使用圖像說明生成功能產生圖像說明文字 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

支援的語言

效能和限制

位置

負責任的 AI 技術安全篩選

篩選後的輸入內容

篩選後的輸出內容

取得短篇圖像說明文字

REST

curl

PowerShell

Python

Node.js

使用圖像說明參數

結果數量

種子數

後續步驟

使用圖像說明生成功能產生圖像說明文字