本頁面由 Cloud Translation API 翻譯而成。

使用圖像問題回答 (VQA) 功能取得圖像資訊

注意：自 2025 年 6 月 24 日起，Imagen 第 1 版和第 2 版將淘汰。Imagen 模型 imagegeneration@002、imagegeneration@005 和 imagegeneration@006 將於 2025 年 9 月 24 日移除。如要進一步瞭解如何遷移至 Imagen 3，請參閱「遷移至 Imagen 3」。

透過圖像問答 (VQA)，您可以向模型提供圖片，並詢問圖片內容相關問題。系統會根據您的問題，提供一或多個自然語言答案。

控制台中的 VQA 圖片、問題和答案範例 — ^{圖片來源 (顯示在 Google Cloud 控制台中)：Unsplash 上的 Sharon Pittaway

提示問題：圖片中有哪些物件？

答案 1：彈珠

答案 2：玻璃彈珠}

支援的語言

VQA 支援下列語言：

英文 (en)

效能和限制

使用這個模型時，請遵守下列限制：

限制	值
每項專案每分鐘的 API 要求數上限 (簡短形式)	500
回覆中傳回的詞元數量上限 (簡短格式)	64 個權杖
要求中接受的權杖數量上限 (僅限 VQA 短格式)	80 個權杖

使用這個模型時，適用下列服務延遲時間預估值。這些值僅供說明，不代表服務承諾：

延遲時間	值
API 要求 (簡短形式)	1.5 秒

位置

位置是您可以在要求中指定的區域，用來控管靜態資料的儲存位置。如需可用區域的清單，請參閱「 Vertex AI 的生成式 AI 服務地區」。

負責任的 AI 技術安全篩選

圖像說明和圖像問題回答 (VQA) 功能模型不支援使用者可設定的安全過濾器。不過，Imagen 整體安全過濾機制會針對下列資料進行過濾：

使用者輸入內容
模型輸出

因此，如果 Imagen 套用這些安全篩選器，您的輸出內容可能會與範例輸出內容不同。請參考以下例子。

篩選後的輸入內容

如果輸入內容經過篩選，回應會類似以下內容：

{
  "error": {
    "code": 400,
    "message": "Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.DebugInfo",
        "detail": "[ORIGINAL ERROR] generic::invalid_argument: Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394 [google.rpc.error_details_ext] { message: \"Media reasoning failed with the following error: The response is blocked, as it may violate our policies. If you believe this is an error, please send feedback to your account team. Error Code: 63429089, 72817394\" }"
      }
    ]
  }
}

篩選後的輸出內容

如果傳回的回應數量少於您指定的樣本數，表示缺少的回應已由 Responsible AI 篩除。舉例來說，以下是含有 "sampleCount": 2 的要求的回應，但其中一個回應遭到篩除：

{
  "predictions": [
    "cappuccino"
  ]
}

如果所有輸出內容都經過篩選，回應會是類似下列內容的空白物件：

{}

對圖片使用 VQA (簡短回覆)

使用下列範例提問，並取得圖片相關解答。

REST

如要進一步瞭解 imagetext 模型要求，請參閱 imagetext 模型 API 參考資料。

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的 Google Cloud 專案 ID。
LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
VQA_PROMPT：你想詢問圖片的問題。
- 這雙鞋是什麼顏色？
- 這件襯衫的袖子是什麼類型？
B64_IMAGE：要取得說明文字的圖片。圖片必須指定為 base64 編碼的位元組字串。大小上限：10 MB。
RESPONSE_COUNT：要生成的答案數量。接受的整數值：1 到 3。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

JSON 要求主體：

{
  "instances": [
    {
      "prompt": "VQA_PROMPT",
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT
  }
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下範例回應適用於含有 "sampleCount": 2 和 "prompt": "What is this?" 的要求。回應會傳回兩個預測字串答案。

{
  "predictions": [
    "cappuccino",
    "coffee"
  ]
}

Python

在試用這個範例之前，請先按照Python使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Python API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

在本範例中，您會使用 load_from_file 方法，將本機檔案參照為基本 Image，以取得相關資訊。指定基礎圖片後，請對 ImageTextModel 使用 ask_question 方法，然後列印答案。


import vertexai
from vertexai.preview.vision_models import Image, ImageTextModel

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# input_file = "input-image.png"
# question = "" # The question about the contents of the image.

vertexai.init(project=PROJECT_ID, location="us-central1")

model = ImageTextModel.from_pretrained("imagetext@001")
source_img = Image.load_from_file(location=input_file)

answers = model.ask_question(
    image=source_img,
    question=question,
    # Optional parameters
    number_of_results=1,
)

print(answers)
# Example response:
# ['tabby']

Node.js

在試用這個範例之前，請先按照Node.js使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Node.js API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

在本範例中，您會在 PredictionServiceClient 上呼叫 predict 方法。服務會傳回所提供問題的答案。

/**
 * TODO(developer): Update these variables before running the sample.
 */
const projectId = process.env.CAIP_PROJECT_ID;
const location = 'us-central1';
const inputFile = 'resources/cat.png';
// The question about the contents of the image.
const prompt = 'What breed of cat is this a picture of?';

const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction Service Client library
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: `${location}-aiplatform.googleapis.com`,
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function getShortFormImageResponses() {
  const fs = require('fs');
  // Configure the parent resource
  const endpoint = `projects/${projectId}/locations/${location}/publishers/google/models/imagetext@001`;

  const imageFile = fs.readFileSync(inputFile);
  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const instance = {
    prompt: prompt,
    image: {
      bytesBase64Encoded: encodedImage,
    },
  };
  const instanceValue = helpers.toValue(instance);
  const instances = [instanceValue];

  const parameter = {
    // Optional parameters
    sampleCount: 2,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  const predictions = response.predictions;
  if (predictions.length === 0) {
    console.log(
      'No responses were generated. Check the request parameters and image.'
    );
  } else {
    predictions.forEach(prediction => {
      console.log(prediction.stringValue);
    });
  }
}
await getShortFormImageResponses();

使用 VQA 的參數

取得 VQA 回覆時，您可以根據用途設定多個參數。

結果數量

使用結果數參數，限制每次傳送要求時傳回的回應數量。詳情請參閱 imagetext (VQA) 模型 API 參考資料。

種子數

您在要求中加入的數字，可讓系統產生確定性回應。在要求中加入種子號碼，可確保每次都獲得相同的預測結果 (回覆)。不過，答案不一定會依序傳回。詳情請參閱 imagetext (VQA) 模型 API 參考資料。

後續步驟

閱讀有關 Imagen 和其他 Vertex AI 生成式 AI 產品的文章：