Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
imagetext is the name of the model that supports image captioning. imagetext
generates a caption from an image you provide based on the language that you
specify. The model supports the following languages: English (en), German
(de), French (fr), Spanish (es) and Italian (it).
To explore this model in the console, see the Image Captioning model card in
the Model Garden.
{"instances":[{"image":{// Union field can be only one of the following:"bytesBase64Encoded":string,"gcsUri":string,// End of list of possible types for union field."mimeType":string}}],"parameters":{"sampleCount":integer,"storageUri":string,"language":string,"seed":integer}}
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-27 UTC."],[],[],null,["# Image captions\n\n| **Caution:** Starting on June 24, 2025, Imagen versions 1 and 2 are deprecated. Imagen models `imagegeneration@002`, `imagegeneration@005`, and `imagegeneration@006` will be removed on September 24, 2025 . For more information about migrating to Imagen 3, see [Migrate to\n| Imagen 3](/vertex-ai/generative-ai/docs/image/migrate-to-imagen-3).\n\n\u003cbr /\u003e\n\n`imagetext` is the name of the model that supports image captioning. `imagetext`\ngenerates a caption from an image you provide based on the language that you\nspecify. The model supports the following languages: English (`en`), German\n(`de`), French (`fr`), Spanish (`es`) and Italian (`it`).\n\nTo explore this model in the console, see the `Image Captioning` model card in\nthe Model Garden.\n\n\n[View Imagen for Captioning \\& VQA model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/imagetext)\n\nUse cases\n---------\n\nSome common use cases for image captioning include:\n\n- Creators can generate captions for uploaded images and videos (for example, a short description of a video sequence)\n- Generate captions to describe products\n- Integrate captioning with an app using the API to create new experiences\n\nHTTP request\n------------\n\n POST https://us-central1-aiplatform.googleapis.com/v1/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/us-central1/publishers/google/models/imagetext:predict\n\nRequest body\n------------\n\n {\n \"instances\": [\n {\n \"image\": {\n // Union field can be only one of the following:\n \"bytesBase64Encoded\": string,\n \"gcsUri\": string,\n // End of list of possible types for union field.\n \"mimeType\": string\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": integer,\n \"storageUri\": string,\n \"language\": string,\n \"seed\": integer\n }\n }\n\nUse the following parameters for the Imagen model `imagetext`.\nFor more information, see\n[Get image descriptions using visual captioning](/vertex-ai/generative-ai/docs/image/image-captioning).\n\nSample request\n--------------\n\n### REST\n\nTo test a text prompt by using the Vertex AI API, send a POST request to the\npublisher model endpoint.\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your Google Cloud [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: Your project's region. For example, `us-central1`, `europe-west2`, or `asia-northeast3`. For a list of available regions, see [Generative AI on Vertex AI locations](/vertex-ai/generative-ai/docs/learn/locations-genai).\n- \u003cvar translate=\"no\"\u003eB64_IMAGE\u003c/var\u003e: The image to get captions for. The image must be specified as a [base64-encoded](/vertex-ai/generative-ai/docs/image/base64-encode) byte string. Size limit: 10 MB.\n- \u003cvar translate=\"no\"\u003eRESPONSE_COUNT\u003c/var\u003e: The number of image captions you want to generate. Accepted integer values: 1-3.\n- \u003cvar translate=\"no\"\u003eLANGUAGE_CODE\u003c/var\u003e: One of the supported language codes. Languages supported:\n - English (`en`)\n - French (`fr`)\n - German (`de`)\n - Italian (`it`)\n - Spanish (`es`)\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\n```\n\n\nRequest JSON body:\n\n```\n{\n \"instances\": [\n {\n \"image\": {\n \"bytesBase64Encoded\": \"B64_IMAGE\"\n }\n }\n ],\n \"parameters\": {\n \"sampleCount\": RESPONSE_COUNT,\n \"language\": \"LANGUAGE_CODE\"\n }\n}\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict\" | Select-Object -Expand Content\n```\nThe following sample responses are for a request with `\"sampleCount\": 2`. The response returns two prediction strings.\n\n**English (`en`):** \n\n```\n{\n \"predictions\": [\n \"a yellow mug with a sheep on it sits next to a slice of cake\",\n \"a cup of coffee with a heart shaped latte art next to a slice of cake\"\n ],\n \"deployedModelId\": \"DEPLOYED_MODEL_ID\",\n \"model\": \"projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID\",\n \"modelDisplayName\": \"MODEL_DISPLAYNAME\",\n \"modelVersionId\": \"1\"\n}\n```\n\n**Spanish (`es`):**\n\n```\n{\n \"predictions\": [\n \"una taza de café junto a un plato de pastel de chocolate\",\n \"una taza de café con una forma de corazón en la espuma\"\n ]\n}\n```\n\n\u003cbr /\u003e\n\nResponse body\n-------------\n\n {\n \"predictions\": [ string ]\n }\n\nSample response\n---------------\n\n {\n \"predictions\": [\n \"text1\",\n \"text2\"\n ]\n }"]]