Edit images with Gemini

This page shows you how to use Gemini to generate images. You can perform the following tasks:

Gemini 2.5 Flash Image Preview supports response generation in multiple modalities, including text and images.

Image generation capabilities

The Gemini model offers several ways to work with images. The following table compares these functionalities to help you choose the best one for your use case.

Functionality Description Use Case
Generate images Create a new image from only a text prompt. Creating a completely new visual from an idea or concept.
Generate interleaved images and text Produce a single response that contains both generated text and relevant, newly created images. Creating tutorials, recipes, or stories where visuals are needed to illustrate steps or concepts described in the text.

Gemini 2.5 Flash Image's public preview for image generation (gemini-2.5-flash-image-preview) supports the ability to generate images in addition to text. This expands Gemini's capabilities to include the following:

  • Iteratively generate images through conversation with natural language, adjusting images while maintaining consistency and context.
  • Generate images with high-quality long text rendering.
  • Generate interleaved text-image output. For example, a blog post with text and images in a single turn. Previously, this required stringing together multiple models.
  • Generate images using Gemini's world knowledge and reasoning capabilities.

With this public experimental release, Gemini 2.5 Flash Image can generate images in 1024px, supports generating images of people, and contains updated safety filters that provide a more flexible and less restrictive user experience.

It supports the following modalities and capabilities:

  • Text to image

    • Example prompt: "Generate an image of the Eiffel tower with fireworks in the background."
  • Text to image (text rendering)

    • Example prompt: "generate a cinematic photo of a large building with this giant text projection mapped on the front of the building: "Gemini 2.5 can now generate long form text""
  • Text to image(s) and text (interleaved)

    • Example prompt: "Generate an illustrated recipe for a paella. Create images alongside the text as you generate the recipe."
    • Example prompt: "Generate a story about a dog in a 3D cartoon animation style. For each scene, generate an image"
  • Image(s) and text to image(s) and text (interleaved)

    • Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? Can you update the image?"
  • Image editing (text and image to image)

    • Example prompt: "Edit this image to make it look like a cartoon"
    • Example prompt: [image of a cat] + [image of a pillow] + "Create a cross stitch of my cat on this pillow."
  • Multi-turn image editing (chat)

    • Example prompts: [upload an image of a blue car.] "Turn this car into a convertible." "Now change the color to yellow."

Limitations:

  • For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
  • Image generation does not support audio or video inputs.
  • Image generation may not always trigger:
    • The model may output text only. Try asking for image outputs explicitly. For example, "provide images as you go along."
    • The model may generate text as an image. Try asking for text outputs explicitly. For example, "generate narrative text along with illustrations."
    • The model may stop generating partway through. Try again or try a different prompt.

Generate images

You can generate images using either Vertex AI Studio or the API.

Method Description Pros Cons
Vertex AI Studio A web-based UI for building and experimenting with generative AI models. Easy to use, no coding required, good for rapid prototyping. Less suitable for automation or integration into applications.
API (REST & Python SDK) A programmatic interface to integrate Gemini features into your applications. Full control, enables automation and deep integration. Requires coding and environment setup.

For guidance and best practices for prompting, see Design multimodal prompts.

Console

To generate an image:

  1. Go to the Vertex AI Studio > Create prompt page.
  2. Click Switch model and select gemini-2.5-flash-image-preview from the menu.
  3. In the Outputs panel, select Image and text from the drop-down menu.
  4. In the Write a prompt text area, write a description of the image you want to generate.
  5. Click Submit ().

Gemini generates an image based on your description. This process usually takes a few seconds but might be slower depending on the current capacity.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, Modality
from PIL import Image
from io import BytesIO

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=("Generate an image of the Eiffel tower with fireworks in the background."),
    config=GenerateContentConfig(
        response_modalities=[Modality.TEXT, Modality.IMAGE],
        candidate_count=1,
        safety_settings=[
            {"method": "PROBABILITY"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT"},
            {"threshold": "BLOCK_MEDIUM_AND_ABOVE"},
        ]
    ),
)
for part in response.candidates[0].content.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.save("output_folder/example-image-eiffel-tower.png")
# Example response:
#   I will generate an image of the Eiffel Tower at night, with a vibrant display of
#   colorful fireworks exploding in the dark sky behind it. The tower will be
#   illuminated, standing tall as the focal point of the scene, with the bursts of
#   light from the fireworks creating a festive atmosphere.

Node.js

Install

npm install @google/genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

const fs = require('fs');
const {GoogleGenAI, Modality} = require('@google/genai');

const GOOGLE_CLOUD_PROJECT = process.env.GOOGLE_CLOUD_PROJECT;
const GOOGLE_CLOUD_LOCATION =
  process.env.GOOGLE_CLOUD_LOCATION || 'us-central1';

async function generateContent(
  projectId = GOOGLE_CLOUD_PROJECT,
  location = GOOGLE_CLOUD_LOCATION
) {
  const ai = new GoogleGenAI({
    vertexai: true,
    project: projectId,
    location: location,
  });

  const response = await ai.models.generateContentStream({
    model: 'gemini-2.0-flash-exp',
    contents:
      'Generate an image of the Eiffel tower with fireworks in the background.',
    config: {
      responseModalities: [Modality.TEXT, Modality.IMAGE],
    },
  });

  const generatedFileNames = [];
  let imageIndex = 0;
  for await (const chunk of response) {
    const text = chunk.text;
    const data = chunk.data;
    if (text) {
      console.debug(text);
    } else if (data) {
      const fileName = `generate_content_streaming_image_${imageIndex++}.png`;
      console.debug(`Writing response image to file: ${fileName}.`);
      try {
        fs.writeFileSync(fileName, data);
        generatedFileNames.push(fileName);
      } catch (error) {
        console.error(`Failed to write image file ${fileName}:`, error);
      }
    }
  }

  return generatedFileNames;
}

REST

To generate an image, send a POST request to the generateContent method using the following cURL command:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps."},
    },
    "generation_config": {
      "response_modalities": ["TEXT", "IMAGE"],
     },
     "safetySettings": {
      "method": "PROBABILITY",
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  }' 2>/dev/null >response.json

Gemini generates an image based on your description. This process usually takes a few seconds but might be slower depending on the current capacity.

Generate interleaved images and text

Gemini 2.5 Flash Image Preview can generate interleaved images with its text responses. For example, you can ask the model to create a recipe and also generate an image for each step. This avoids making separate requests for the text and each image.

Console

To generate interleaved images and text:

  1. Go to the Vertex AI Studio > Create prompt page.
  2. Click Switch model and select gemini-2.5-flash-image-preview from the menu.
  3. In the Outputs panel, select Image and text from the drop-down menu.
  4. In the Write a prompt text area, write a description of the response you want to generate. For example: "Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps. For each step, provide a title with the number of the step, an explanation, and also generate an image, generate each image in a 1:1 aspect ratio."
  5. Click Submit ().

Gemini generates a response that includes text and images based on your description. This process usually takes a few seconds but might be slower depending on the current capacity.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, Modality
from PIL import Image
from io import BytesIO

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=(
        "Generate an illustrated recipe for a paella."
        "Create images to go alongside the text as you generate the recipe"
    ),
    config=GenerateContentConfig(response_modalities=[Modality.TEXT, Modality.IMAGE]),
)
with open("output_folder/paella-recipe.md", "w") as fp:
    for i, part in enumerate(response.candidates[0].content.parts):
        if part.text is not None:
            fp.write(part.text)
        elif part.inline_data is not None:
            image = Image.open(BytesIO((part.inline_data.data)))
            image.save(f"output_folder/example-image-{i+1}.png")
            fp.write(f"![image](example-image-{i+1}.png)")
# Example response:
#  A markdown page for a Paella recipe(`paella-recipe.md`) has been generated.
#   It includes detailed steps and several images illustrating the cooking process.

REST

To generate interleaved images and text, send a POST request to the generateContent method using the following cURL command:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps. For each step, provide a title with the number of the step, an explanation, and also generate an image, generate each image in a 1:1 aspect ratio."},
    },
    "generation_config": {
      "response_modalities": ["TEXT", "IMAGE"],
     },
     "safetySettings": {
      "method": "PROBABILITY",
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  }' 2>/dev/null >response.json

Gemini generates a response that includes text and images based on your description. This process usually takes a few seconds but might be slower depending on the current capacity.

Locale-aware image generation

Gemini 2.5 Flash Image Preview can also include information about your location when providing text or image responses. For example, you can generate images of types of locations or experiences that take your current location into account without having to specify your location to the model to do so.

Console

To use locale-aware image generation:

  1. Open Vertex AI Studio > Create prompt.
  2. Click Switch model and select gemini-2.5-flash-image-preview from the menu.
  3. In the Outputs panel, select Image and text from the drop-down menu.
  4. Write a description of the image you want to generate in the text area of the Write a prompt text area. For example, "Generate a photo of a typical breakfast."
  5. Click the Prompt () button.

Gemini will generate a response based on your description. This process should take a few seconds, but may be comparatively slower depending on capacity.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, Modality
from PIL import Image
from io import BytesIO

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=("Generate a photo of a breakfast meal."),
    config=GenerateContentConfig(response_modalities=[Modality.TEXT, Modality.IMAGE]),
)
for part in response.candidates[0].content.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.save("output_folder/example-breakfast-meal.png")
# Example response:
#   Generates a photo of a vibrant and appetizing breakfast meal.
#   The scene will feature a white plate with golden-brown pancakes
#   stacked neatly, drizzled with rich maple syrup and ...

REST

Run the following command in the terminal to create or overwrite this file in the current directory:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Generate a photo of a typical breakfast."},
    },
    "generation_config": {
      "response_modalities": ["TEXT", "IMAGE"],
     },
     "safetySettings": {
      "method": "PROBABILITY",
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  }' 2>/dev/null >response.json

Gemini will generate an image based on your description. This process should take a few seconds, but may be comparatively slower depending on capacity.