Mulai 29 April 2025, model Gemini 1.5 Pro dan Gemini 1.5 Flash tidak tersedia di project yang belum pernah menggunakan model ini, termasuk project baru. Untuk mengetahui detailnya, lihat Versi dan siklus proses model.
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Imagen for Captioning & VQA (imagetext) adalah nama model yang mendukung pertanyaan dan jawaban gambar. Imagen for Captioning & VQA menjawab pertanyaan yang diberikan untuk gambar tertentu, meskipun gambar tersebut belum pernah dilihat oleh model ini.
Untuk menjelajahi model ini di konsol, lihat kartu model Imagen for Captioning & VQA di Model Garden.
{"instances":[{"prompt":string,"image":{// Union field can be only one of the following:"bytesBase64Encoded":string,"gcsUri":string,// End of list of possible types for union field."mimeType":string}}],"parameters":{"sampleCount":integer,"seed":integer}}
LOCATION: Region project Anda. Misalnya,
us-central1, europe-west2, atau asia-northeast3. Untuk mengetahui daftar
region yang tersedia, lihat
Lokasi AI Generatif di Vertex AI.
VQA_PROMPT: Pertanyaan tentang gambar yang jawabannya ingin Anda peroleh.
Apa warna sepatu ini?
Lengan jenis apa yang digunakan di kemeja ini?
B64_IMAGE: Gambar yang akan diberi teks. Gambar harus ditentukan sebagai string byte berenkode Base64. Batas ukuran: 10 MB.
RESPONSE_COUNT: Jumlah jawaban yang ingin Anda hasilkan. Nilai bilangan bulat yang diterima: 1-3.
Metode HTTP dan URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict
Contoh respons berikut adalah untuk permintaan dengan "sampleCount": 2 dan "prompt": "What is this?". Respons akan menampilkan dua jawaban string prediksi.
{
"predictions": [
"cappuccino",
"coffee"
]
}
Isi respons
{"predictions":[string]}
Elemen respons
Deskripsi
predictions
Daftar string teks yang merepresentasikan jawaban VQA, diurutkan berdasarkan keyakinan.
Contoh respons
Contoh respons berikut adalah untuk permintaan dengan "sampleCount": 2 dan
"prompt": "What is this?". Respons menampilkan dua jawaban string prediksi.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-25 UTC."],[],[],null,["# Visual question and answering (VQA)\n\nImagen for Captioning \\& VQA (`imagetext`) is the name of the model that supports image question and\nanswering. Imagen for Captioning \\& VQA answers a question provided for a given image, even\nif it hasn't been seen before by the model.\n\nTo explore this model in the console, see the Imagen for Captioning \\& VQA model card in\nthe Model Garden.\n\n\n[View Imagen for Captioning \\& VQA model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext)\n\nUse cases\n---------\n\nSome common use cases for image question and answering include:\n\n- Empower users to engage with visual content with Q\\&A.\n- Enable customers to engage with product images shown on retail apps and websites.\n- Provide accessibility options for visually impaired users.\n\nHTTP request\n------------\n\n POST https://us-central1-aiplatform.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/us-central1/publishers/google/models/imagetext:predict\n\nRequest body\n------------\n\n {\n \"instances\": [\n {\n \"prompt\": string,\n \"image\": {\n // Union field can be only one of the following:\n \"bytesBase64Encoded\": string,\n \"gcsUri\": string,\n // End of list of possible types for union field.\n \"mimeType\": string\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": integer,\n \"seed\": integer\n }\n }\n\nUse the following parameters for the visual Q\\&A generation model `imagetext`.\nFor more information, see [Use Visual Question Answering (VQA)](/vertex-ai/generative-ai/docs/image/visual-question-answering).\n\nSample request\n--------------\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your Google Cloud [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: Your project's region. For example, `us-central1`, `europe-west2`, or `asia-northeast3`. For a list of available regions, see [Generative AI on Vertex AI locations](/vertex-ai/generative-ai/docs/learn/locations-genai).\n- \u003cvar translate=\"no\"\u003eVQA_PROMPT\u003c/var\u003e: The question you want to get answered about your image.\n - *What color is this shoe?*\n - *What type of sleeves are on the shirt?*\n- \u003cvar translate=\"no\"\u003eB64_IMAGE\u003c/var\u003e: The image to get captions for. The image must be specified as a [base64-encoded](/vertex-ai/generative-ai/docs/image/base64-encode) byte string. Size limit: 10 MB.\n- \u003cvar translate=\"no\"\u003eRESPONSE_COUNT\u003c/var\u003e: The number of answers you want to generate. Accepted integer values: 1-3.\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\n```\n\n\nRequest JSON body:\n\n```\n{\n \"instances\": [\n {\n \"prompt\": \"VQA_PROMPT\",\n \"image\": {\n \"bytesBase64Encoded\": \"B64_IMAGE\"\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": RESPONSE_COUNT\n }\n}\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\" | Select-Object -Expand Content\n```\nThe following sample responses are for a request with `\"sampleCount\": 2` and `\"prompt\": \"What is this?\"`. The response returns two prediction string answers.\n\n```\n{\n \"predictions\": [\n \"cappuccino\",\n \"coffee\"\n ]\n}\n```\n\n\u003cbr /\u003e\n\nResponse body\n-------------\n\n\n {\n \"predictions\": [\n string\n ]\n }\n\nSample response\n---------------\n\nThe following sample responses is for a request with `\"sampleCount\": 2` and\n`\"prompt\": \"What is this?\"`. The response returns two prediction string answers. \n\n {\n \"predictions\": [\n \"cappuccino\",\n \"coffee\"\n ],\n \"deployedModelId\": \"DEPLOYED_MODEL_ID\",\n \"model\": \"projects/PROJECT_ID/locations/us-central1/models/MODEL_ID\",\n \"modelDisplayName\": \"MODEL_DISPLAYNAME\",\n \"modelVersionId\": \"1\"\n }"]]