Insert objects into an image using inpaint

This guide shows you how to insert objects into an image using inpainting. You can use one of the following methods:

The following diagram shows the workflow for inserting objects into an image:

Content insertion example

Inpainting lets you use a base image, a mask, and a text prompt to add content to an existing image.

Inputs

Base image* to edit Mask area specified using tools in the Google Cloud console Text prompt
An example base image. A glass jar contains a red liquid with a
           lemon slice on the side and a straw sticking out. Lemon slices are
           visible in the left foreground The base image of a glass jar, now with a mask area specified in
           the Cloud console strawberries

* Image credit: Alex Lvrs on Unsplash.

Output after specifying a mask area in the Google Cloud console

A screenshot from a generated edit of a glass jar containing a
           red liquid. In this screenshot, the lemon slices previously in the
           image foreground are replaced by two strawberries directly in front
           of the jar. A screenshot from a generated edit of a glass jar containing a
           red liquid. In this screenshot, the lemon slices previously in the
           image foreground are replaced by three strawberries just to the left
           of the jar. A screenshot from a generated edit of a glass jar containing a
           red liquid. In this screenshot, the lemon slices previously in the
           image foreground are replaced by two strawberries, slighly in front
           of and to the left of the jar.

View Imagen for Editing and Customization model card

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

  8. Set up authentication for your environment.

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    Java

    To use the Java samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.

      Install the Google Cloud CLI.

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

      If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

      If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    For more information, see Set up ADC for a local development environment in the Google Cloud authentication documentation.

    Node.js

    To use the Node.js samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.

      Install the Google Cloud CLI.

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

      If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

      If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    For more information, see Set up ADC for a local development environment in the Google Cloud authentication documentation.

    Python

    To use the Python samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.

      Install the Google Cloud CLI.

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

      If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

      If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    For more information, see Set up ADC for a local development environment in the Google Cloud authentication documentation.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI.

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Choose a masking method

Imagen on Vertex AI offers two methods for creating a mask for inpainting. Use the following table to help you choose the best method for your use case.

Method Description Pros Use Case
Defined mask area You provide a precise mask by uploading a file or drawing directly in the console. High precision and complete control over the edit area. Best for edits that require exact placement or have complex shapes that are difficult for a model to detect automatically.
Automatic mask detection The model automatically identifies key elements (for example, background, foreground, people) and generates a mask for you. Fast and convenient, requiring no manual effort to create a mask. Ideal for quick edits where the target area is a common, easily detectable element, such as replacing the entire background or a person.

Insert with a defined mask area

Use the following samples to insert content by providing a base image, a text prompt, and a mask that defines the area to modify.

Imagen 3

Use the following samples to send an inpainting request using the Imagen 3 model.

Console

  1. In the Google Cloud console, go to the Vertex AI > Media Studio page.

    Go to Media Studio

  2. Click Upload and select an image file.
  3. Click Inpaint.
  4. Specify the mask area by doing one of the following:

    • Upload a mask file: Click Upload mask and select your mask file.
    • Draw a mask: in the editing toolbar, use the mask tools (box, brush, or masked_transitionsinvert tool) to specify the area or areas to add content to.
  5. Optional: In the Parameters panel, adjust any of the following options:

    • Model: the Imagen model to use
    • Number of results: the number of results to generate
    • Negative prompt: items to avoid generating
  6. In the prompt field, enter a text prompt that describes the content to add.
  7. Click Generate.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import RawReferenceImage, MaskReferenceImage, MaskReferenceConfig, EditImageConfig

client = genai.Client()

# TODO(developer): Update and un-comment below line
# output_file = "output-image.png"

raw_ref = RawReferenceImage(
    reference_image=Image.from_file(location='test_resources/fruit.png'), reference_id=0)
mask_ref = MaskReferenceImage(
    reference_id=1,
    reference_image=Image.from_file(location='test_resources/fruit_mask.png'),
    config=MaskReferenceConfig(
        mask_mode="MASK_MODE_USER_PROVIDED",
        mask_dilation=0.01,
    ),
)

image = client.models.edit_image(
    model="imagen-3.0-capability-001",
    prompt="A plate of cookies",
    reference_images=[raw_ref, mask_ref],
    config=EditImageConfig(
        edit_mode="EDIT_MODE_INPAINT_INSERTION",
    ),
)

image.generated_images[0].image.save(output_file)

print(f"Created output image using {len(image.generated_images[0].image.image_bytes)} bytes")
# Example response:
# Created output image using 1234567 bytes

REST

For more information, see the Edit images API reference.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: Your project's region. For example, us-central1, europe-west2, or asia-northeast3. For a list of available regions, see Generative AI on Vertex AI locations.
  • TEXT_PROMPT: The text prompt guides what images the model generates. When you use a prompt for inpainting insertion, use a description of the masked area for best results. Avoid single-word prompts. For example, use "a cute corgi" instead of "corgi".
  • B64_BASE_IMAGE: The base image to edit or upscale. The image must be specified as a base64-encoded byte string. Size limit: 10 MB.
  • B64_MASK_IMAGE: The black and white image you want to use as a mask layer to edit the original image. The image must be specified as a base64-encoded byte string. Size limit: 10 MB.
  • MASK_DILATION - float. The percentage of image width to dilate this mask by. A value of 0.01 is recommended to compensate for imperfect input masks.
  • EDIT_STEPS - integer. The number of sampling steps for the base model. For inpainting insertion, start at 35 steps. Increase steps to upper limit of 75 if the quality doesn't meet your requirements. Increasing steps also increases request latency.
  • EDIT_IMAGE_COUNT - The number of edited images. Accepted integer values: 1-4. Default value: 4.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": "TEXT_PROMPT",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_RAW",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "B64_BASE_IMAGE"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_MASK",
          "referenceId": 2,
          "referenceImage": {
            "bytesBase64Encoded": "B64_MASK_IMAGE"
          },
          "maskImageConfig": {
            "maskMode": "MASK_MODE_USER_PROVIDED",
            "dilation": MASK_DILATION
          }
        }
      ]
    }
  ],
  "parameters": {
    "editConfig": {
      "baseSteps": EDIT_STEPS
    },
    "editMode": "EDIT_MODE_INPAINT_INSERTION",
    "sampleCount": EDIT_IMAGE_COUNT
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content
The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.
{
  "predictions": [
    {
      "bytesBase64Encoded": "BASE64_IMG_BYTES",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": "BASE64_IMG_BYTES"
    }
  ]
}

Imagen 2

Use the following samples to send an inpainting request using the Imagen 2 model.

Console

  1. In the Google Cloud console, go to the Vertex AI > Media Studio page.

    Go to Media Studio

  2. Click Upload and select an image file.
  3. Click Inpaint.
  4. Specify the mask area by doing one of the following:

    • Upload a mask file: Click Upload mask and select your mask file.
    • Draw a mask: In the editing toolbar, use the mask tools (box, brush, or masked_transitionsinvert tool) to specify the area or areas to add content to.
  5. Optional: In the Parameters panel, adjust any of the following options:

    • Model: the Imagen model to use
    • Number of results: the number of results to generate
    • Negative prompt: items to avoid generating
  6. In the prompt field, enter a text prompt that describes the content to add.
  7. Click Generate.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.


import vertexai
from vertexai.preview.vision_models import Image, ImageGenerationModel

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# input_file = "input-image.png"
# mask_file = "mask-image.png"
# output_file = "output-image.png"
# prompt = "red hat" # The text prompt describing what you want to see inserted.

vertexai.init(project=PROJECT_ID, location="us-central1")

model = ImageGenerationModel.from_pretrained("imagegeneration@006")
base_img = Image.load_from_file(location=input_file)
mask_img = Image.load_from_file(location=mask_file)

images = model.edit_image(
    base_image=base_img,
    mask=mask_img,
    prompt=prompt,
    edit_mode="inpainting-insert",
)

images[0].save(location=output_file, include_generation_parameters=False)

# Optional. View the edited image in a notebook.
# images[0].show()

print(f"Created output image using {len(images[0]._image_bytes)} bytes")
# Example response:
# Created output image using 1400814 bytes

REST

For more information about imagegeneration model requests, see the imagegeneration model API reference.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: Your project's region. For example, us-central1, europe-west2, or asia-northeast3. For a list of available regions, see Generative AI on Vertex AI locations.
  • TEXT_PROMPT: The text prompt that guides what images the model generates. This field is required for both generation and editing.
  • B64_BASE_IMAGE: The base image to edit or upscale. The image must be specified as a base64-encoded byte string. Size limit: 10 MB.
  • B64_MASK_IMAGE: The black and white image you want to use as a mask layer to edit the original image. The image must be specified as a base64-encoded byte string. Size limit: 10 MB.
  • EDIT_IMAGE_COUNT: The number of edited images. Default value: 4.
  • GUIDANCE_SCALE_VALUE: A parameter (integer) that controls how much the model adheres to the text prompt. Larger values increase alignment between the text prompt and generated images, but may compromise image quality. Values: 0 - 500. Default: 60.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagegeneration@006:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": "TEXT_PROMPT",
      "image": {
          "bytesBase64Encoded": "B64_BASE_IMAGE"
      },
      "mask": {
        "image": {
          "bytesBase64Encoded": "B64_MASK_IMAGE"
        }
      }
    }
  ],
  "parameters": {
    "sampleCount": EDIT_IMAGE_COUNT,
    "editConfig": {
      "editMode": "inpainting-insert",
      "guidanceScale": GUIDANCE_SCALE_VALUE
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagegeneration@006:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagegeneration@006:predict" | Select-Object -Expand Content
The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.
{
  "predictions": [
    {
      "bytesBase64Encoded": "BASE64_IMG_BYTES",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": "BASE64_IMG_BYTES"
    }
  ]
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

In this sample, you specify the model as part of an EndpointName. The EndpointName is passed to the predict method which is called on a PredictionServiceClient. The service returns an edited version of the image, which is then saved locally.

For more information about model versions and features, see Imagen models.


import com.google.api.gax.rpc.ApiException;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.gson.Gson;
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

public class EditImageInpaintingInsertMaskSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "my-project-id";
    String location = "us-central1";
    String inputPath = "/path/to/my-input.png";
    String maskPath = "/path/to/my-mask.png";
    String prompt =
        ""; // The text prompt describing what you want to see inserted in the mask area.

    editImageInpaintingInsertMask(projectId, location, inputPath, maskPath, prompt);
  }

  // Edit an image using a mask file. Inpainting can insert the object designated by the prompt
  // into the masked area.
  public static PredictResponse editImageInpaintingInsertMask(
      String projectId, String location, String inputPath, String maskPath, String prompt)
      throws ApiException, IOException {
    final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {

      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(
              projectId, location, "google", "imagegeneration@006");

      // Encode image and mask to Base64
      String imageBase64 =
          Base64.getEncoder().encodeToString(Files.readAllBytes(Paths.get(inputPath)));
      String maskBase64 =
          Base64.getEncoder().encodeToString(Files.readAllBytes(Paths.get(maskPath)));

      // Create the image and image mask maps
      Map<String, String> imageMap = new HashMap<>();
      imageMap.put("bytesBase64Encoded", imageBase64);

      Map<String, String> maskMap = new HashMap<>();
      maskMap.put("bytesBase64Encoded", maskBase64);
      Map<String, Map> imageMaskMap = new HashMap<>();
      imageMaskMap.put("image", maskMap);

      Map<String, Object> instancesMap = new HashMap<>();
      instancesMap.put("prompt", prompt); // [ "prompt", "<my-prompt>" ]
      instancesMap.put(
          "image", imageMap); // [ "image", [ "bytesBase64Encoded", "iVBORw0KGgo...==" ] ]
      instancesMap.put(
          "mask",
          imageMaskMap); // [ "mask", [ "image", [ "bytesBase64Encoded", "iJKDF0KGpl...==" ] ] ]
      instancesMap.put("editMode", "inpainting-insert"); // [ "editMode", "inpainting-insert" ]
      Value instances = mapToValue(instancesMap);

      // Optional parameters
      Map<String, Object> paramsMap = new HashMap<>();
      paramsMap.put("sampleCount", 1);
      Value parameters = mapToValue(paramsMap);

      PredictResponse predictResponse =
          predictionServiceClient.predict(
              endpointName, Collections.singletonList(instances), parameters);

      for (Value prediction : predictResponse.getPredictionsList()) {
        Map<String, Value> fieldsMap = prediction.getStructValue().getFieldsMap();
        if (fieldsMap.containsKey("bytesBase64Encoded")) {
          String bytesBase64Encoded = fieldsMap.get("bytesBase64Encoded").getStringValue();
          Path tmpPath = Files.createTempFile("imagen-", ".png");
          Files.write(tmpPath, Base64.getDecoder().decode(bytesBase64Encoded));
          System.out.format("Image file written to: %s\n", tmpPath.toUri());
        }
      }
      return predictResponse;
    }
  }

  private static Value mapToValue(Map<String, Object> map) throws InvalidProtocolBufferException {
    Gson gson = new Gson();
    String json = gson.toJson(map);
    Value.Builder builder = Value.newBuilder();
    JsonFormat.parser().merge(json, builder);
    return builder.build();
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

In this sample, you call the predict method on a PredictionServiceClient. The service generates images that are then saved to a local file. For more information about model versions and features, see Imagen models.

/**
 * TODO(developer): Update these variables before running the sample.
 */
const projectId = process.env.CAIP_PROJECT_ID;
const location = 'us-central1';
const inputFile = 'resources/woman.png';
const maskFile = 'resources/woman_inpainting_insert_mask.png';
const prompt = 'hat';

const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction Service Client library
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: `${location}-aiplatform.googleapis.com`,
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function editImageInpaintingInsertMask() {
  const fs = require('fs');
  const util = require('util');
  // Configure the parent resource
  const endpoint = `projects/${projectId}/locations/${location}/publishers/google/models/imagegeneration@006`;

  const imageFile = fs.readFileSync(inputFile);
  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const maskImageFile = fs.readFileSync(maskFile);
  // Convert the image mask data to a Buffer and base64 encode it.
  const encodedMask = Buffer.from(maskImageFile).toString('base64');

  const promptObj = {
    prompt: prompt, // The text prompt describing what you want to see inserted
    editMode: 'inpainting-insert',
    image: {
      bytesBase64Encoded: encodedImage,
    },
    mask: {
      image: {
        bytesBase64Encoded: encodedMask,
      },
    },
  };
  const instanceValue = helpers.toValue(promptObj);
  const instances = [instanceValue];

  const parameter = {
    // Optional parameters
    seed: 100,
    // Controls the strength of the prompt
    // 0-9 (low strength), 10-20 (medium strength), 21+ (high strength)
    guidanceScale: 21,
    sampleCount: 1,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  const predictions = response.predictions;
  if (predictions.length === 0) {
    console.log(
      'No image was generated. Check the request parameters and prompt.'
    );
  } else {
    let i = 1;
    for (const prediction of predictions) {
      const buff = Buffer.from(
        prediction.structValue.fields.bytesBase64Encoded.stringValue,
        'base64'
      );
      // Write image content to the output file
      const writeFile = util.promisify(fs.writeFile);
      const filename = `output${i}.png`;
      await writeFile(filename, buff);
      console.log(`Saved image ${filename}`);
      i++;
    }
  }
}
await editImageInpaintingInsertMask();

Insert with automatic mask detection

Use the following samples to insert content by providing a base image and a text prompt. Imagen automatically detects an object in the image and creates a mask to define the area to modify.

Imagen 3

Use the following samples to send an inpainting request using the Imagen 3 model.

Console

  1. In the Google Cloud console, go to the Vertex AI > Media Studio page.

    Go to Media Studio

  2. Click Upload and select an image file.
  3. Click Inpaint.
  4. In the editing toolbar, click background_replaceExtract mask.
  5. Select one of the mask extraction options:

    • Background elements: Detects the background elements and creates a mask around them.
    • Foreground elements: Detects the foreground objects and creates a mask around them.
    • background_replacePeople: Detects people and creates a mask around them.
  6. Optional: In the Parameters panel, adjust any of the following options:

    • Model: the Imagen model to use
    • Number of results: the number of results to generate
    • Negative prompt: items to avoid generating
  7. In the prompt field, enter a text prompt that describes the content to add.
  8. Click sendGenerate.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import RawReferenceImage, MaskReferenceImage, MaskReferenceConfig, EditImageConfig

client = genai.Client()

# TODO(developer): Update and un-comment below line
# output_file = "output-image.png"

raw_ref = RawReferenceImage(
    reference_image=Image.from_file(location='test_resources/fruit.png'), reference_id=0)
mask_ref = MaskReferenceImage(
    reference_id=1,
    reference_image=None,
    config=MaskReferenceConfig(
        mask_mode="MASK_MODE_FOREGROUND",
        mask_dilation=0.1,
    ),
)

image = client.models.edit_image(
    model="imagen-3.0-capability-001",
    prompt="A small white ceramic bowl with lemons and limes",
    reference_images=[raw_ref, mask_ref],
    config=EditImageConfig(
        edit_mode="EDIT_MODE_INPAINT_INSERTION",
    ),
)

image.generated_images[0].image.save(output_file)

print(f"Created output image using {len(image.generated_images[0].image.image_bytes)} bytes")
# Example response:
# Created output image using 1234567 bytes

REST

For more information, see the Edit images API reference.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: Your project's region. For example, us-central1, europe-west2, or asia-northeast3. For a list of available regions, see Generative AI on Vertex AI locations.
  • TEXT_PROMPT: The text prompt guides what images the model generates. When you use a prompt for inpainting insertion, use a description of the masked area for best results. Avoid single-word prompts. For example, use "a cute corgi" instead of "corgi".
  • B64_BASE_IMAGE: The base image to edit or upscale. The image must be specified as a base64-encoded byte string. Size limit: 10 MB.
  • MASK_MODE - A string that sets the type of automatic mask creation the model uses. Available values:
    • MASK_MODE_BACKGROUND: Automatically generates a mask using background segmentation.
    • MASK_MODE_FOREGROUND: Automatically generates a mask using foreground segmentation.
    • MASK_MODE_SEMANTIC: Automatically generates a mask using semantic segmentation based on the segmentation classes you specify in the maskImageConfig.maskClasses array. For example:
                "maskImageConfig": {
                  "maskMode": "MASK_MODE_SEMANTIC",
                  "maskClasses": [175, 176], // bicycle, car
                  "dilation": 0.01
                }
              
  • MASK_DILATION - float. The percentage of image width to dilate this mask by. A value of 0.01 is recommended to compensate for imperfect input masks.
  • EDIT_STEPS - integer. The number of sampling steps for the base model. For inpainting insertion, start at 35 steps. Increase steps to upper limit of 75 if the quality doesn't meet your requirements. Increasing steps also increases request latency.
  • EDIT_IMAGE_COUNT - The number of edited images. Accepted integer values: 1-4. Default value: 4.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": "TEXT_PROMPT",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_RAW",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "B64_BASE_IMAGE"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_MASK",
          "referenceId": 2,
          "maskImageConfig": {
            "maskMode": "MASK_MODE",
            "dilation": MASK_DILATION
          }
        }
      ]
    }
  ],
  "parameters": {
    "editConfig": {
      "baseSteps": EDIT_STEPS
    },
    "editMode": "EDIT_MODE_INPAINT_INSERTION",
    "sampleCount": EDIT_IMAGE_COUNT
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content
The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.
{
  "predictions": [
    {
      "bytesBase64Encoded": "BASE64_IMG_BYTES",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": "BASE64_IMG_BYTES"
    }
  ]
}

Imagen 2

Use the following samples to send an inpainting request using the Imagen 2 model.

Console

  1. In the Google Cloud console, go to the Vertex AI > Media Studio page.

    Go to Media Studio

  2. In the lower task panel, click Edit image.

  3. Click Upload and select the image to edit.

  4. In the editing toolbar, click background_replaceExtract.

  5. Select one of the mask extraction options:

    • Background elements: Detects the background elements and creates a mask around them.
    • Foreground elements: Detects the foreground objects and creates a mask around them.
    • background_replacePeople: Detects people and creates a mask around them.
  6. Optional: In the Parameters panel, adjust the Number of results, Negative prompt, Text prompt guidance, or other parameters.

  7. In the prompt field, enter a text prompt that describes the content to add.

  8. Click Generate.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.


import vertexai
from vertexai.preview.vision_models import Image, ImageGenerationModel

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# input_file = "input-image.png"
# mask_mode = "background" # 'background', 'foreground', or 'semantic'
# output_file = "output-image.png"
# prompt = "beach" # The text prompt describing what you want to see inserted.

vertexai.init(project=PROJECT_ID, location="us-central1")

model = ImageGenerationModel.from_pretrained("imagegeneration@006")
base_img = Image.load_from_file(location=input_file)

images = model.edit_image(
    base_image=base_img,
    mask_mode=mask_mode,
    prompt=prompt,
    edit_mode="inpainting-insert",
)

images[0].save(location=output_file, include_generation_parameters=False)

# Optional. View the edited image in a notebook.
# images[0].show()

print(f"Created output image using {len(images[0]._image_bytes)} bytes")
# Example response:
# Created output image using 1234567 bytes

REST

For more information about imagegeneration model requests, see the imagegeneration model API reference.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: Your project's region. For example, us-central1, europe-west2, or asia-northeast3. For a list of available regions, see Generative AI on Vertex AI locations.
  • TEXT_PROMPT: The text prompt that guides what images the model generates. This field is required for both generation and editing.
  • B64_BASE_IMAGE: The base image to edit or upscale. The image must be specified as a base64-encoded byte string. Size limit: 10 MB.
  • EDIT_IMAGE_COUNT: The number of edited images. Default value: 4.
  • MASK_TYPE: Prompts the model to generate a mask instead of you needing to provide one. Consequently, when you provide this parameter, you should omit a mask object. Available values:
    • background: Automatically generates a mask to all regions except primary object, person, or subject in the image.
    • foreground: Automatically generates a mask to the primary object, person, or subject in the image.
    • semantic: Use automatic segmentation to create a mask area for one or more of the segmentation classes. Set the segmentation classes using the classes parameter and the corresponding class_id values. You can specify up to 5 classes. When you use the semantic mask type, the maskMode object should look like the following:
      "maskMode": {
        "maskType": "semantic",
        "classes": [class_id1, class_id2]
      }
  • GUIDANCE_SCALE_VALUE: A parameter (integer) that controls how much the model adheres to the text prompt. Larger values increase alignment between the text prompt and generated images, but may compromise image quality. Values: 0 - 500. Default: 60.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagegeneration@006:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": "TEXT_PROMPT",
      "image": {
        "bytesBase64Encoded": "B64_BASE_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": EDIT_IMAGE_COUNT,
    "editConfig": {
      "editMode": "inpainting-insert",
      "maskMode": {
        "maskType": "MASK_TYPE"
      },
      "guidanceScale": GUIDANCE_SCALE_VALUE
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagegeneration@006:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagegeneration@006:predict" | Select-Object -Expand Content
The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.
{
  "predictions": [
    {
      "bytesBase64Encoded": "BASE64_IMG_BYTES",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": "BASE64_IMG_BYTES"
    }
  ]
}

Limitations

The following sections describe limitations of the object insertion feature in Imagen.

Modified pixels

The model might modify pixels outside of the masked area. These changes are usually minor. The generated pixels are also at the model's native resolution (for example, 1024x1024).

For perfect preservation of the unmasked area, blend the generated image with the original input image using the mask. Blending is recommended if your input image resolution is 2K or higher.

Insert limitation

The insertion feature typically matches the style of the base image. However, some keywords might cause the model to generate images in a cartoon style, even if you intend a photorealistic output.

For example, a prompt for a yellow giraffe might produce a cartoon-like image because photorealistic giraffes are typically brown and tan. Similarly, generating photorealistic images with unnatural colors can be difficult.

What's next

Read articles about Imagen and other Generative AI on Vertex AI products: