Vertex AI video generation prompt guide

This guide shows you how to write effective prompts for video generation with Vertex AI Veo, covering the following topics:

Prompt guide overview

Vertex AI Veo is a model that generates video from text (text-to-video) or from an image and text (image-to-video). To generate a video, you provide a prompt, which is a text description of what you want the model to create.

Safety filters

Veo applies safety filters across Vertex AI to help ensure that generated videos and uploaded photos don't contain offensive content. For example, prompts that violate responsible AI guidelines are blocked.

If you suspect abuse of Veo or any generated output that contains inappropriate material or inaccurate information, use the Report suspected abuse on Google Cloud form.

Best practices for writing prompts

Good prompts are descriptive and clear. To get your generated video closer to what you want, start by identifying your core idea and then refine it by adding keywords and modifiers.

Core prompt elements

For best results, include the following elements in your prompt:

  • Subject: The object, person, animal, or scenery that you want in your video.
  • Context: The background or setting in which the subject is placed.
  • Action: What the subject is doing (for example, walking, running, or turning their head).
  • Style: This can be general or very specific. Consider using specific film style keywords, such as horror film, film noir, or animated styles like cartoon style render.
  • Camera motion (Optional): What the camera is doing, such as an aerial view, eye-level, top-down shot, or low-angle shot.
  • Composition (Optional): How the shot is framed, such as a wide shot, close-up, or extreme close-up.
  • Ambiance (Optional): How color and light contribute to the scene, such as blue tones, night, or warm tones.

Examples of core elements

These examples show how to refine your prompts by specifying core elements.

Subject description

This can include a subject, multiple subjects, and actions.

Prompt Generated output
An architectural rendering of a white concrete apartment building with flowing organic shapes, seamlessly blending with lush greenery and futuristic elements. A white concrete apartment building.

Context

The background or context for your subject is an important part of the prompt. Try placing your subject in different backgrounds, such as on a busy street or in outer space.

Prompt Generated output
A satellite floating through outer space with the moon and some stars in the background. Satellite floating in the atmosphere.

Action

Specify what the subject is doing, such as walking, running, or turning their head.

Prompt Generated output
A wide shot of a woman walking along the beach, looking content and relaxed and looking towards the horizon at sunset. A woman walking on the beach at sunset.

Style

You can add keywords to improve generation quality and steer it closer to the intended style, such as shallow depth of field, movie still, minimalistic, surreal, vintage, futuristic, or double-exposure.

Prompt Generated output
Film noir style, man and woman walk on the street, mystery, cinematic, black and white. A man and woman in film noir style.

Camera motion

Specify camera movements like POV shot, aerial view, tracking drone view, or tracking shot.

Prompt Generated output
A POV shot from a vintage car driving in the rain, Canada at night, cinematic. A point-of-view shot from a car driving in the rain.

Composition

Specify how the shot is framed, such as wide shot, close-up, or low angle.

Prompt Generated output
Extreme close-up of an eye with a city reflected in it. A close-up of an eye.
Create a video of a wide shot of a surfer walking on a beach with a surfboard, beautiful sunset, cinematic. A wide shot of a surfer on a beach.

Ambiance

Adding colors helps make the video unique and convey emotions. The color palette influences the mood and emotional impact of the video. For example, a warm, golden palette can create a romantic and atmospheric feel. Examples of color palettes include pastel blue and pink tones, dim ambient lighting, and cold muted tones.

Prompt Generated output
A close-up of a girl holding an adorable golden retriever puppy in the park, sunlight. A puppy in a young girl's arms.
Cinematic close-up shot of a sad woman riding a bus in the rain, cool blue tones, sad mood. A sad woman riding on a bus.

General tips

  • Use descriptive language: Use adjectives and adverbs to create a clear picture for Veo.
  • Provide context: Include background information to help the model understand your request.
  • Reference specific artistic styles: If you have a particular aesthetic in mind, mention specific artistic styles or art movements.
  • Use prompt engineering tools: Explore prompt engineering tools or resources to help you refine your prompts. For more information, see Introduction to prompting.
  • Enhance facial details: To improve facial details, specify them as a focus by using words like portrait in your prompt.

Examples of prompt refinement

This section shows how adding detail to your prompts can refine the generated video.

Icicles

This video demonstrates how you can use each of the core elements in your prompt.

Prompt Generated output
Close up shot (composition) of melting icicles (subject) on a frozen rock wall (context) with cool blue tones (ambiance), zoomed in (camera motion) maintaining close-up detail of water drips (action). Dripping icicles with a blue background.

Man on the phone

These videos demonstrate how you can revise your prompt with more specific details.

Prompt Generated output Analysis
The camera dollies to show a close up of a desperate man in a green trench coat that's making a call on a rotary-style wall phone with a green neon light and a movie scene. Man talking on the phone. This is the initial video generated from the prompt.
A close-up cinematic shot follows a desperate man in a weathered green trench coat as he dials a rotary phone mounted on a gritty brick wall, bathed in the eerie glow of a green neon sign. The camera dollies in, revealing the tension in his jaw and the desperation etched on his face as he struggles to make the call. The shallow depth of field focuses on his furrowed brow and the black rotary phone, blurring the background into a sea of neon colors and indistinct shadows, creating a sense of urgency and isolation. Man talking on the phone This more detailed prompt results in a video that is more focused and has a richer environment.
A video with smooth motion that dollies in on a desperate man in a green trench coat, using a vintage rotary phone against a wall bathed in an eerie green neon glow. The camera starts from a medium distance, slowly moving closer to the man's face, revealing his frantic expression and the sweat on his brow as he urgently dials the phone. The focus is on the man's hands, his fingers fumbling with the dial as he desperately tries to connect. The green neon light casts long shadows on the wall, adding to the tense atmosphere. The scene is framed to emphasize the isolation and desperation of the man, highlighting the stark contrast between the vibrant glow of the neon and the man's grim determination. Man talking on the phone. Adding even more detail gives the subject a realistic expression and creates an intense, vibrant scene.

Snow leopard

The following example shows how adding more detail can generate output that is closer to what you want.

Prompt Generated output
A cute creature with snow leopard-like fur is walking in winter forest, 3D cartoon style render. A cartoon snow leopard walking.
Create a short 3D animated scene in a joyful cartoon style. A cute creature with snow leopard-like fur, large expressive eyes, and a friendly, rounded form happily prances through a whimsical winter forest. The scene should feature rounded, snow-covered trees, gentle falling snowflakes, and warm sunlight filtering through the branches. The creature's bouncy movements and wide smile should convey pure delight. Aim for an upbeat, heartwarming tone with bright, cheerful colors and playful animation. Consider adding subtle, whimsical sound effects to enhance the joyful winter atmosphere. A happy cartoon snow leopard running.

Add audio to your video

Audio is supported by veo-3.0-generate-001 in Preview.

Clearly specify if you want audio. We recommend using separate sentences in your prompt to describe the audio. The following are examples of audio descriptions in a prompt:

  • Sound effects:

    • The audio features water splashing in the background.
    • Add soft music in the background.
  • Speech:

    • The man in the red hat says, "Where is the rabbit?" Then the woman in the green dress next to him replies, "There, in the woods."

Use reference images to generate videos

You can bring images to life by using the image-to-video capability in Veo. You can use your existing assets or use Imagen to generate something new.

Prompt Generated output
Bunny with a chocolate candy bar. A static image of a bunny with a chocolate bar.
Bunny runs away. An animated video of the bunny running away.

When you use image-to-video, we recommend the following:

  • Ensure that your action and speech descriptions align with each subject in the input image.
  • If the input image has multiple subjects, clearly specify which character is performing an action or speaking. To differentiate between characters, use their most distinguishing descriptive details. For example:
    • The man in the red hat.
    • The woman in the blue dress.

Negative prompts

Negative prompts are a powerful tool to specify what elements to exclude from your video.

  • ❌ Avoid instructive language or words like no or don't. For example, "No walls" or "don't show walls".
  • ✅ Describe what you don't want to see. For example, to exclude a wall or a frame from the video, use "wall, frame".
Prompt Generated output
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should feature a gentle, atmospheric soundtrack and use a warm, inviting color palette. A tree with leaves blowing in the wind.
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should feature a gentle, atmospheric soundtrack and use a warm, inviting color palette.

With negative prompt - urban background, man-made structures, dark, stormy, or threatening atmosphere.
A tree with leaves blowing, without an urban background.

Set aspect ratios

Vertex AI Veo supports the following two aspect ratios for video generation:

Aspect ratio Description
Widescreen or 16:9 The 16:9 aspect ratio is the most common for televisions, monitors, and mobile phone screens in landscape orientation. Use this ratio to capture more of the background, such as scenic landscapes.
Portrait or 9:16

This is a rotated widescreen format, popularized by short-form video applications like YouTube Shorts. Use this aspect ratio for portraits or tall objects with strong vertical orientations, such as buildings, trees, or waterfalls.

Widescreen (16:9)

The following is an example prompt for the widescreen aspect ratio.

Prompt Generated output
Create a video with a tracking drone view of a man driving a red convertible car in Palm Springs, 1970s, warm sunlight, long shadows. A man driving a red convertible in widescreen.

Portrait (9:16)

The following is an example prompt for the portrait aspect ratio.

Prompt Generated output
Create a video with a smooth motion of a majestic Hawaiian waterfall within a lush rainforest. Focus on realistic water flow, detailed foliage, and natural lighting to convey tranquility. Capture the rushing water, misty atmosphere, and dappled sunlight filtering through the dense canopy. Use smooth, cinematic camera movements to showcase the waterfall and its surroundings. Aim for a peaceful, realistic tone, transporting the viewer to the serene beauty of the Hawaiian rainforest. A Hawaiian waterfall in portrait aspect ratio.