图片说明

注意：自 2025 年 6 月 24 日起，Imagen 版本 1 和 2 已弃用。Imagen 模型 imagegeneration@002、imagegeneration@005 和 imagegeneration@006 将于 2025 年 9 月 24 日移除。如需详细了解如何迁移到 Imagen 3，请参阅迁移到 Imagen 3。

imagetext 是支持图片说明的模型的名称。 imagetext 可以根据您指定的语言对您提供的图片生成图片说明。该模型支持以下语言：英语 (en)、德语 (de)、法语 (fr)、西班牙语 (es) 和意大利语 (it)。

如需在控制台中浏览此模型，请参阅模型库中的 Image Captioning 模型卡片。

查看 Imagen for Captioning & VQA 模型卡片

使用场景

图片说明的一些常见应用场景包括：

创建者可为上传的图片和视频生成图片说明（例如，视频序列的简短说明）
生成图片说明以描述产品
使用 API 将图片说明与应用集成，打造新体验

HTTP 请求

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagetext:predict

请求正文

{
  "instances": [
    {
      "image": {
        // Union field can be only one of the following:
        "bytesBase64Encoded": string,
        "gcsUri": string,
        // End of list of possible types for union field.
        "mimeType": string
      }
    }
  ],
  "parameters": {
    "sampleCount": integer,
    "storageUri": string,
    "language": string,
    "seed": integer
  }
}

对 Imagen 模型 imagetext 使用以下参数。如需了解详情，请参阅使用视觉标注获取图片说明。

参数	说明	可接受的值
`instances`	一个数组，包含要获取其相关信息的对象以及图片详细信息。	数组（允许 1 个图片对象）
`bytesBase64Encoded`	要显示说明的图片。	采用 Base64 编码的图片字符串（PNG 或 JPEG，最大 20 MB）
`gcsUri`	要显示说明的图片的 Cloud Storage URI。	Cloud Storage 中图片文件的字符串 URI（PNG 或 JPEG，最大 20 MB）
`mimeType`	可选。您指定的图片的 MIME 类型。	字符串（`image/jpeg` 或 `image/png`）
`sampleCount`	生成的文本字符串数。	整数值：1-3
`seed`	可选。随机数生成器 (RNG) 的种子。如果包含输入的请求的 RNG 种子相同，则预测结果将相同。	整数
`storageUri`	可选。用于保存生成的文本响应的 Cloud Storage 位置。	字符串
`language`	可选。引导响应的文本提示。	字符串：`en`（默认）、`de`、`fr`、`it`、`es`

示例请求

REST

如需使用 Vertex AI API 测试文本提示，请向发布方模型端点发送 POST 请求。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：您的 Google Cloud 项目 ID。
LOCATION：您的项目的区域。例如 us-central1、europe-west2 或 asia-northeast3。如需查看可用区域的列表，请参阅 Vertex AI 上的生成式 AI 位置。
B64_IMAGE：要获取其说明的图片。图片必须指定为 base64 编码的字节字符串。大小上限：10 MB。
RESPONSE_COUNT：您要生成的图片说明数量。接受的整数值：1-3。
LANGUAGE_CODE：支持的语言代码之一。支持的语言：
- 英语 (en)
- 法语 (fr)
- 德语 (de)
- 意大利语 (it)
- 西班牙语 (es)

HTTP 方法和网址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

请求 JSON 正文：

{
  "instances": [
    {
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT,
    "language": "LANGUAGE_CODE"
  }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下示例响应适用于包含 "sampleCount": 2 的请求。该响应会返回两个预测字符串。

英语 (en)：

{
  "predictions": [
    "a yellow mug with a sheep on it sits next to a slice of cake",
    "a cup of coffee with a heart shaped latte art next to a slice of cake"
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID",
  "model": "projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
  "modelDisplayName": "MODEL_DISPLAYNAME",
  "modelVersionId": "1"
}

西班牙语 (es)：

{
  "predictions": [
    "una taza de café junto a un plato de pastel de chocolate",
    "una taza de café con una forma de corazón en la espuma"
  ]
}

响应正文

{
  "predictions": [ string ]
}

响应元素	说明
`predictions`	表示图片说明的文本字符串列表，按置信度排序。

示例响应

{
  "predictions": [
    "text1",
    "text2"
  ]
}

图片说明 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

使用场景

HTTP 请求

请求正文

示例请求

REST

curl

PowerShell

响应正文

示例响应

图片说明