Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
A raw reference image is required for editing use cases.
A raw reference image isn't needed for other use cases.
At most one raw reference image exists in one request.
The output image has the same size as the raw reference input image.
REFERENCE_TYPE_MASK
A mask reference image is required for masked editing use cases.
A mask reference image isn't required for other use cases.
If a raw reference image is present, the mask image has to be in the same size as the raw reference image.
The user can either provide their own mask, or let Imagen compute the mask for them from the provided reference image.
If mask reference image is empty and maskMode is not set to MASK_MODE_USER_PROVIDED, the mask is computed based on the raw reference image.
REFERENCE_TYPE_CONTROL
If raw reference image is present, the control image has to be in the same size with raw reference image.
If control reference image is empty and enableControlImageComputation is set to true, the control image is computed based on the raw reference image.
REFERENCE_TYPE_SUBJECT
The user can provide multiple reference images with the same reference ID. For example, multiple images for the same subject can have the same reference ID. This could potentially improve the output quality.
REFERENCE_TYPE_STYLE
referenceId
Required integer
The reference ID. Use this reference ID in the prompt. For example,
use [1] to refer to the reference images with referenceId=1, [2] to
refer to the reference images with referenceId=2.
referenceImage.bytesBase64Encoded
Required string
A Base64 string for the encoded reference image.
maskImageConfig.maskMode
Optional enumeration:
MASK_MODE_USER_PROVIDED, if the reference image is a mask image.
MASK_MODE_BACKGROUND, to automatically generate a mask using background segmentation.
MASK_MODE_FOREGROUND, to automatically generate a mask using foreground segmentation.
MASK_MODE_SEMANTIC, to automatically generate a mask using semantic segmentation, and the given mask class.
Specified when referenceType is set as REFERENCE_TYPE_MASK.
maskImageConfig.dilation
Optional float. Range: [0, 1]
The percentage of image width to dilate this mask by.
Specified when referenceType is set as REFERENCE_TYPE_MASK.
Specified when referenceType is set as REFERENCE_TYPE_CONTROL.
controlImageConfig.enableControlImageComputation
Optional bool.
Default: false.
Set to false if you provide your own control image.
Set to true if you want to let Imagen compute the control
image from the reference image.
Specified when referenceType is set as REFERENCE_TYPE_CONTROL.
language
Optional: string (imagen-3.0-capability-001,
imagen-3.0.generate-001, and
imagegeneration@006 only)
The language code that corresponds to your text prompt language.
The following values are supported:
auto: Automatic detection. If Imagen
detects a supported language, the prompt and an optional negative
prompt are translated to English. If the language detected isn't
supported, Imagen uses the input text verbatim, which
might result in an unexpected output. No error code is returned.
en: English (if omitted, the default value)
es: Spanish
hi: Hindi
ja: Japanese
ko: Korean
pt: Portuguese
zh-TW: Chinese (traditional)
zh or zh-CN: Chinese (simplified)
subjectImageConfig.subjectDescription
Required string.
A short description of the subject in the image. For example, a woman
with short brown hair.
Specified when referenceType is set as REFERENCE_TYPE_SUBJECT.
subjectImageConfig.subjectType
Required enumeration:
SUBJECT_TYPE_PERSON: Person subject type.
SUBJECT_TYPE_ANIMAL: Animal subject type.
SUBJECT_TYPE_PRODUCT: Product subject type.
SUBJECT_TYPE_DEFAULT: Default subject type.
Specified when referenceType is set as REFERENCE_TYPE_SUBJECT.
styleImageConfig.styleDescription
Optional string.
A short description for the style.
Specified when referenceType is set as REFERENCE_TYPE_STYLE.
Response
The response body from the REST request.
Parameter
predictions
An array of
VisionGenerativeModelResult objects,
one for each requested sampleCount. If any images are
filtered by responsible AI, they are not included.
Vision generative model result object
Information about the model result.
Parameter
bytesBase64Encoded
The base64 encoded generated image. Not present if the output image
did not pass responsible AI filters.
mimeType
The type of the generated image. Not present if the output image did
not pass responsible AI filters.
Examples
The following examples show how to use the Imagen model
to customize images.
Customize images
REST
Before using any of the request data,
make the following replacements:
LOCATION: Your project's region. For example,
us-central1, europe-west2, or asia-northeast3. For a list
of available regions, see
Generative AI on Vertex AI locations.
TEXT_PROMPT: The text prompt guides what images the model
generates. To use Imagen 3 Customization, include the referenceId of
the reference image or images
you provide in the format [$referenceId]. For example:
The following text prompt is for a request that has two reference images with
"referenceId": 1. Both images have an optional
description of "subjectDescription": "man with short hair":
Create an image about a man with short hair to match the description: A
pencil style sketch of a full-body portrait of a man with short hair [1] with
hatch-cross
drawing, hatch drawing of portrait with 6B and graphite pencils, white background, pencil
drawing, high quality, pencil stroke, looking at camera, natural human eyes
"referenceId": The ID of the reference image, or the ID for a series of reference
images that correspond to the same subject or style. In this example the two reference images
are of the same person, so they share the same referenceId (1).
BASE64_REFERENCE_IMAGE: A reference image to guide image generation. The
image must be specified as a base64-encoded byte
string.
SUBJECT_DESCRIPTION: Optional. A text description of the reference image you can
then use in the prompt field. For example:
"prompt": "a full-body portrait of a man with short hair [1] with hatch-cross
drawing",
[...],
"subjectDescription": "man with short hair"
IMAGE_COUNT: The number of generated images.
Accepted integer values: 1-4. Default value: 4.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict
The following sample response is for a request with
"sampleCount": 2. The response returns two prediction objects, with
the generated image bytes base64-encoded.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,[]]