Gemini 2.0 Flash supports response generation in multiple modalities, including text, speech, and images.
Text generation
Gemini 2.0 Flash supports text generation using the Google Cloud console, REST API, and supported SDKs. For more information, see our text generation guide.
Speech generation (private experimental)
Gemini 2.0 supports a new multimodal generation capability: text to speech.
Using the text-to-speech capability, you can prompt the model to generate high
quality audio output that sounds like a human voice (say "hi everyone"
), and
you can further refine the output by steering the voice.
Generate speech
The following sections cover how to generate speech using either Vertex AI Studio or using the API.
For guidance and best practices for prompting, see Design multimodal prompts.
Using Vertex AI Studio
To use speech generation:
- Open Vertex AI Studio > Freeform.
-
Select
gemini-2.0-flash-exp
from the Models drop-down menu. - In the Response panel, select Audio from the drop-down menu.
- Write a description of the speech you want to generate in the text area of the Prompt panel.
- Click the Prompt ( ) button.
Gemini will generate speech based on your description. This process should take a few seconds, but may be comparatively slower depending on capacity.*
Using the API
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite this file in
the current directory:
cat << EOF > request.json { "contents": [ { "role": "user", "parts": [ { "text": "Say, 'How are you?'" } ] } ], "generation_config": { "response_modalities": [ "AUDIO"" ] }, "safety_settings": [ { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE" }, { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE" } ] } EOF
Then execute the following command to send your REST request:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-exp:generateContent" \ -d $"@request.json"
Gemini will generate audio based on your description. This process should take a few seconds, but may be comparatively slower depending on capacity.
Image generation (public experimental)
Gemini 2.0 Flash Experimental Image Generation (gemini-2.0-flash-exp
) supports
the ability to generate images in addition to text. This expands Gemini's
capabilities to include the following:
- Iteratively generate images through conversation with natural language, adjusting images while maintaining consistency and context.
- Generate images with high-quality long text rendering.
- Generate interleaved text-image output. For example, a blog post with text and images in a single turn. Previously, this required stringing together multiple models.
- Generate images using Gemini's world knowledge and reasoning capabilities.
With this public experimental release, Gemini 2.0 Flash Experimental Image Generation can generate images in 1024px, supports generating and editing images of people, and contains updated safety filters that provide a more flexible and less restrictive user experience.
It supports the following modalities and capabilities:
Text to image
- Example prompt: "Generate an image of the Eiffel tower with fireworks in the background."
Text to image (text rendering)
- Example prompt: "generate a cinematic photo of a large building with this giant text projection mapped on the front of the building: "Gemini 2.0 can now generate long form text""
Text to image(s) and text (interleaved)
- Example prompt: "Generate an illustrated recipe for a paella. Create images alongside the text as you generate the recipe."
- Example prompt: "Generate a story about a dog in a 3D cartoon animation style. For each scene, generate an image"
Image(s) and text to image(s) and text (interleaved)
- Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? Can you update the image?"
Image editing (text and image to image)
- Example prompt: "Edit this image to make it look like a cartoon"
- Example prompt: [image of a cat] + [image of a pillow] + "Create a cross stitch of my cat on this pillow."
Multi-turn image editing (chat)
- Example prompts: [upload an image of a blue car.] "Turn this car into a convertible." "Now change the color to yellow."
Limitations:
- For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
- Image generation does not support audio or video inputs.
- Image generation may not always trigger:
- The model may output text only. Try asking for image outputs explicitly. For example, "provide images as you go along."
- The model may generate text as an image. Try asking for text outputs explicitly. For example, "generate narrative text along with illustrations."
- The model may stop generating partway through. Try again or try a different prompt.
Generate images
The following sections cover how to generate images using either Vertex AI Studio or using the API.
For guidance and best practices for prompting, see Design multimodal prompts.
Using Vertex AI Studio
To use image generation:
- Open Vertex AI Studio > Freeform.
-
Select
gemini-2.0-flash-exp
from the Models drop-down menu. - In the Response panel, select Image and text from the drop-down menu.
- Write a description of the image you want to generate in the text area of the Prompt panel.
- Click the Prompt ( ) button.
Gemini will generate an image based on your description. This process should take a few seconds, but may be comparatively slower depending on capacity.
Using the API
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite this file in
the current directory:
cat << EOF > request.json { "contents": [ { "role": "user", "parts": [ { "text": "Generate an image of a cat." } ] } ], "generation_config": { "response_modalities": [ "IMAGE", "TEXT" ] }, "safety_settings": [ { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE" }, { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE" } ] } EOF
Then execute the following command to send your REST request:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-exp:generateContent" \ -d $"@request.json"
Gemini will generate an image based on your description. This process should take a few seconds, but may be comparatively slower depending on capacity.