Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Use negative prompts: Understand how to exclude unwanted elements from your video.
Set aspect ratios: Learn how to specify widescreen or portrait aspect ratios for your output.
Prompt guide overview
Vertex AI Veo is a model that generates video from text (text-to-video) or from an image and text (image-to-video). To generate a video, you provide a prompt, which is a text description of what you want the model to create.
Safety filters
Veo applies safety filters across Vertex AI to help ensure that generated videos and uploaded photos don't contain offensive content. For example, prompts that violate responsible AI guidelines are blocked.
Good prompts are descriptive and clear. To get your generated video closer to what you want, start by identifying your core idea and then refine it by adding keywords and modifiers.
Core prompt elements
For best results, include the following elements in your prompt:
Subject: The object, person, animal, or scenery that you want in your video.
Context: The background or setting in which the subject is placed.
Action: What the subject is doing (for example, walking, running, or turning their head).
Style: This can be general or very specific. Consider using specific film style keywords, such as horror film, film noir, or animated styles like cartoon style render.
Camera motion (Optional): What the camera is doing, such as an aerial view, eye-level, top-down shot, or low-angle shot.
Composition (Optional): How the shot is framed, such as a wide shot, close-up, or extreme close-up.
Ambiance (Optional): How color and light contribute to the scene, such as blue tones, night, or warm tones.
Examples of core elements
These examples show how to refine your prompts by specifying core elements.
Subject description
This can include a subject, multiple subjects, and actions.
Prompt
Generated output
An architectural rendering of a white concrete apartment building with flowing organic shapes, seamlessly blending with lush greenery and futuristic elements.
Context
The background or context for your subject is an important part of the prompt. Try placing your subject in different backgrounds, such as on a busy street or in outer space.
Prompt
Generated output
A satellite floating through outer space with the moon and some stars in the background.
Action
Specify what the subject is doing, such as walking, running, or turning their head.
Prompt
Generated output
A wide shot of a woman walking along the beach, looking content and relaxed and looking towards the horizon at sunset.
Style
You can add keywords to improve generation quality and steer it closer to the intended style, such as shallow depth of field, movie still, minimalistic, surreal, vintage, futuristic, or double-exposure.
Prompt
Generated output
Film noir style, man and woman walk on the street, mystery, cinematic, black and white.
Camera motion
Specify camera movements like POV shot, aerial view, tracking drone view, or tracking shot.
Prompt
Generated output
A POV shot from a vintage car driving in the rain, Canada at night, cinematic.
Composition
Specify how the shot is framed, such as wide shot, close-up, or low angle.
Prompt
Generated output
Extreme close-up of an eye with a city reflected in it.
Create a video of a wide shot of a surfer walking on a beach with a surfboard, beautiful sunset, cinematic.
Ambiance
Adding colors helps make the video unique and convey emotions. The color palette influences the mood and emotional impact of the video. For example, a warm, golden palette can create a romantic and atmospheric feel. Examples of color palettes include pastel blue and pink tones, dim ambient lighting, and cold muted tones.
Prompt
Generated output
A close-up of a girl holding an adorable golden retriever puppy in the park, sunlight.
Cinematic close-up shot of a sad woman riding a bus in the rain, cool blue tones, sad mood.
General tips
Use descriptive language: Use adjectives and adverbs to create a clear picture for Veo.
Provide context: Include background information to help the model understand your request.
Reference specific artistic styles: If you have a particular aesthetic in mind, mention specific artistic styles or art movements.
Use prompt engineering tools: Explore prompt engineering tools or resources to help you refine your prompts. For more information, see Introduction to prompting.
Enhance facial details: To improve facial details, specify them as a focus by using words like portrait in your prompt.
Examples of prompt refinement
This section shows how adding detail to your prompts can refine the generated video.
Icicles
This video demonstrates how you can use each of the core elements in your prompt.
Prompt
Generated output
Close up shot (composition) of melting icicles (subject) on a frozen rock wall (context) with cool blue tones (ambiance), zoomed in (camera motion) maintaining close-up detail of water drips (action).
Man on the phone
These videos demonstrate how you can revise your prompt with more specific details.
Prompt
Generated output
Analysis
The camera dollies to show a close up of a desperate man in a green trench coat that's making a call on a rotary-style wall phone with a green neon light and a movie scene.
This is the initial video generated from the prompt.
A close-up cinematic shot follows a desperate man in a weathered green trench coat as he dials a rotary phone mounted on a gritty brick wall, bathed in the eerie glow of a green neon sign. The camera dollies in, revealing the tension in his jaw and the desperation etched on his face as he struggles to make the call. The shallow depth of field focuses on his furrowed brow and the black rotary phone, blurring the background into a sea of neon colors and indistinct shadows, creating a sense of urgency and isolation.
This more detailed prompt results in a video that is more focused and has a richer environment.
A video with smooth motion that dollies in on a desperate man in a green trench coat, using a vintage rotary phone against a wall bathed in an eerie green neon glow. The camera starts from a medium distance, slowly moving closer to the man's face, revealing his frantic expression and the sweat on his brow as he urgently dials the phone. The focus is on the man's hands, his fingers fumbling with the dial as he desperately tries to connect. The green neon light casts long shadows on the wall, adding to the tense atmosphere. The scene is framed to emphasize the isolation and desperation of the man, highlighting the stark contrast between the vibrant glow of the neon and the man's grim determination.
Adding even more detail gives the subject a realistic expression and creates an intense, vibrant scene.
Snow leopard
The following example shows how adding more detail can generate output that is closer to what you want.
Prompt
Generated output
A cute creature with snow leopard-like fur is walking in winter forest, 3D cartoon style render.
Create a short 3D animated scene in a joyful cartoon style. A cute creature with snow leopard-like fur, large expressive eyes, and a friendly, rounded form happily prances through a whimsical winter forest. The scene should feature rounded, snow-covered trees, gentle falling snowflakes, and warm sunlight filtering through the branches. The creature's bouncy movements and wide smile should convey pure delight. Aim for an upbeat, heartwarming tone with bright, cheerful colors and playful animation. Consider adding subtle, whimsical sound effects to enhance the joyful winter atmosphere.
Add audio to your video
Audio is supported by veo-3.0-generate-001 in Preview.
Clearly specify if you want audio. We recommend using separate sentences in your prompt to describe the audio. The following are examples of audio descriptions in a prompt:
Sound effects:
The audio features water splashing in the background.
Add soft music in the background.
Speech:
The man in the red hat says, "Where is the rabbit?" Then the woman in the green dress next to him replies, "There, in the woods."
Use reference images to generate videos
You can bring images to life by using the image-to-video capability in Veo. You can use your existing assets or use Imagen to generate something new.
Prompt
Generated output
Bunny with a chocolate candy bar.
Bunny runs away.
When you use image-to-video, we recommend the following:
Ensure that your action and speech descriptions align with each subject in the input image.
If the input image has multiple subjects, clearly specify which character is performing an action or speaking. To differentiate between characters, use their most distinguishing descriptive details. For example:
The man in the red hat.
The woman in the blue dress.
Negative prompts
Negative prompts are a powerful tool to specify what elements to exclude from your video.
❌ Avoid instructive language or words like no or don't. For example, "No walls" or "don't show walls".
✅ Describe what you don't want to see. For example, to exclude a wall or a frame from the video, use "wall, frame".
Prompt
Generated output
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should feature a gentle, atmospheric soundtrack and use a warm, inviting color palette.
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should feature a gentle, atmospheric soundtrack and use a warm, inviting color palette.
With negative prompt - urban background, man-made structures, dark, stormy, or threatening atmosphere.
Set aspect ratios
Vertex AI Veo supports the following two aspect ratios for video generation:
Aspect ratio
Description
Widescreen or 16:9
The 16:9 aspect ratio is the most common for televisions, monitors, and mobile phone screens in landscape orientation. Use this ratio to capture more of the background, such as scenic landscapes.
Portrait or 9:16
This is a rotated widescreen format, popularized by short-form video applications like YouTube Shorts. Use this aspect ratio for portraits or tall objects with strong vertical orientations, such as buildings, trees, or waterfalls.
Widescreen (16:9)
The following is an example prompt for the widescreen aspect ratio.
Prompt
Generated output
Create a video with a tracking drone view of a man driving a red convertible car in Palm Springs, 1970s, warm sunlight, long shadows.
Portrait (9:16)
The following is an example prompt for the portrait aspect ratio.
Prompt
Generated output
Create a video with a smooth motion of a majestic Hawaiian waterfall within a lush rainforest. Focus on realistic water flow, detailed foliage, and natural lighting to convey tranquility. Capture the rushing water, misty atmosphere, and dappled sunlight filtering through the dense canopy. Use smooth, cinematic camera movements to showcase the waterfall and its surroundings. Aim for a peaceful, realistic tone, transporting the viewer to the serene beauty of the Hawaiian rainforest.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-23 UTC."],[],[],null,[]]