自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

圖片說明文字

imagetext 是支援圖片說明文字的模型名稱。imagetext 會根據您指定的語言，從您提供的圖片產生字幕。這個模型支援以下語言：英文 (en)、德文 (de)、法文 (fr)、西班牙文 (es) 和義大利文 (it)。

如要在控制台中探索這個模型，請參閱 Model Garden 中的 Image Captioning 模型資訊卡。

查看為說明文字和 VQA 模型建立的 Imagen 資訊卡

用途

圖片標題的常見用途包括：

創作者可以為上傳的圖片和影片產生字幕 (例如，影片序列的簡短說明)
產生說明產品的字幕
使用 API 將字幕與應用程式整合，打造全新體驗

HTTP 要求

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagetext:predict

要求主體

{
  "instances": [
    {
      "image": {
        // Union field can be only one of the following:
        "bytesBase64Encoded": string,
        "gcsUri": string,
        // End of list of possible types for union field.
        "mimeType": string
      }
    }
  ],
  "parameters": {
    "sampleCount": integer,
    "storageUri": string,
    "language": string,
    "seed": integer
  }
}

請為 Imagen 模型 imagetext 使用下列參數。詳情請參閱「使用圖像說明生成功能產生圖像說明文字」。

參數	說明	可接受的值
`instances`	陣列，其中包含要取得資訊的物件，其中包含圖片詳細資料。	陣列 (允許 1 個圖片物件)
`bytesBase64Encoded`	要加上說明文字的圖片。	Base64 編碼的圖片字串 (PNG 或 JPEG，大小上限 20 MB)
`gcsUri`	圖片的 Cloud Storage URI，用於標題。	Cloud Storage 中圖片檔案的字串 URI (PNG 或 JPEG，大小上限 20 MB)
`mimeType`	(非必要) 您指定圖片的 MIME 類型。	字串 (`image/jpeg` 或 `image/png`)
`sampleCount`	產生的文字字串數量。	Int 值：1-3
`seed`	(非必要) 隨機號碼產生器 (RNG) 的種子。如果 RNG 種子與輸入內容的請求相同，預測結果也會相同。	整數
`storageUri`	(非必要) 儲存產生文字回應的 Cloud Storage 位置。	字串
`language`	(非必要) 用於引導回應的文字提示。	字串：`en` (預設)、`de`、`fr`、`it`、`es`

要求範例

REST

如要使用 Vertex AI API 測試文字提示，請將 POST 要求傳送至發布者模型端點。

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的 Google Cloud 專案 ID。
LOCATION：專案所在的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 生成式 AI 位置」。
B64_IMAGE：要取得字幕的圖片。圖片必須以 base64 編碼的位元組字串形式指定。大小限制：10 MB。
RESPONSE_COUNT：要產生的圖片標題數量。可接受的整數值：1 到 3。
LANGUAGE_CODE：支援的語言代碼之一。支援的語言：
- 英文 (en)
- 法文 (fr)
- 德文 (de)
- 義大利文 (it)
- 西班牙文 (es)

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

JSON 要求主體：

{
  "instances": [
    {
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT,
    "language": "LANGUAGE_CODE"
  }
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意：以下指令假設您已使用使用者帳戶登入 gcloud CLI，方法是執行 gcloud init 或 gcloud auth login，或是使用 Cloud Shell，後者會自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意：下列指令假設您已透過執行 gcloud init 或 gcloud auth login 登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下回應範例適用於含有 "sampleCount": 2 的要求。回應會傳回兩個預測字串。

英文 (en)：

{
  "predictions": [
    "a yellow mug with a sheep on it sits next to a slice of cake",
    "a cup of coffee with a heart shaped latte art next to a slice of cake"
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID",
  "model": "projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
  "modelDisplayName": "MODEL_DISPLAYNAME",
  "modelVersionId": "1"
}

西班牙文 (es)：

{
  "predictions": [
    "una taza de café junto a un plato de pastel de chocolate",
    "una taza de café con una forma de corazón en la espuma"
  ]
}

回應主體

{
  "predictions": [ string ]
}

回應元素	說明
`predictions`	代表字幕的文字字串清單，依可信度排序。

回應範例

{
  "predictions": [
    "text1",
    "text2"
  ]
}

圖片說明文字 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

用途

HTTP 要求

要求主體

要求範例

REST

curl

PowerShell

回應主體

回應範例

圖片說明文字