Video Generation Prompt Guide
Table of Contents
1. Quickstart
Basic Prompting Sandwich
A simple way to structure a strong video prompt is:
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]
Cinematography: Define the camera work and shot composition.
Subject: Identify the main character or focal point.
Action: Describe what the subject is doing.
Context: Detail the environment and background elements.
Style & ambiance: Specify the overall aesthetic, mood, and lighting
Example
Medium shot, a tired corporate worker, rubbing his temples in exhaustion, in front of a bulky 1980s computer in a cluttered office late at night. The scene is lit by the harsh fluorescent overhead lights and the green glow of the monochrome monitor. Retro aesthetic, shot as if on 1980s color film, slightly grainy.
2) Break the Prompt Into Clear Elements
Cinematography
Describe how the scene should be filmed.
You can specify:
Shot Type: wide establishing, medium, close-up, extreme close-up, macro, over-the-shoulder, long shot, POV
Camera Angle: eye-level, low angle, high angle, top-down, worm's-eye, dutch angle,
Camera Movement: slow dolly-in, pan, tilt, locked-off tripod, handheld tracking, orbit, crane shot, static shot (or fixed), truck, pedestal, zoom, drone, whip pan, arc shot
Lens & Focus: 35mm lens, shallow depth of field, soft bokeh, anamorphic flares, wide angle, telephoto, deep depth of field, lens flare, rack focus, fisheye, vertigo
Example:
Subject
State the main focus of the shot. Be specific. Include defining details like color, material, age, species, clothing, or expression where helpful.
Examples:
People: Generic descriptors (man, woman, elderly person), Specific professions, Historical figures, Mythical beings ("mischievous fairy", "a stoic knight")
Animals or creatures: Specific breeds of animals, Fantastical creatures: ("a miniature dragon with iridescent scales", "a wise, ancient talking tree")
Objects: Everyday items, Vehicles, Abstract shapes ("glowing orbs", "crystalline structures")
Action
Describe what the subject is doing. The clearer the action, the more predictable the animation.
You can specify:
Basic movements: walking, running, jumping, flying, swimming, dancing, spinning, falling, standing still, sitting
Interactions: talking, laughing, arguing, hugging, fighting, playing a game, cooking, building, writing, reading, observing
Emotional expressions: smiling, frowning, surprise, concentrating deeply, appearing thoughtful, showing excitement, crying
Subtle actions: a gentle breeze ruffling hair, leaves rustling, a subtle nod, fingers tapping impatiently, eyes blinking slowly
Transformations or processes: a flower blooming in fast-motion, ice melting, a city skyline developing over time (however, keep clip length in mind for events that occur over a longer period)
Context
Describe the setting or situation around the subject. Context grounds the scene and prevents generic outputs.
You can specify:
Location (interior): a cozy living room with a crackling fireplace, a sterile futuristic laboratory, a cluttered artist's studio, a grand ballroom, a dusty attic
Location (exterior): a sun-drenched tropical beach, a misty ancient forest, a bustling futuristic cityscape at night, a serene mountain peak at dawn, a desolate alien planet
Time of day: golden hour, midday sun, twilight, deep night, pre-dawn
Weather: clear blue sky, overcast and gloomy, light drizzle, heavy thunderstorm with visible lightning, gentle snowfall, swirling fog
Historical or fantastical period: a medieval castle courtyard, a roaring 1920s jazz club, a cyberpunk alleyway, an enchanted forest glade
Atmospheric details: floating dust motes in a sunbeam, shimmering heat haze, reflections on wet pavement, leaves scattered by the wind
Style & Ambiance
Describe the look and mood of the video. Instead of writing something broad like “epic vibe,” define the mood through lighting, palette, and texture.
You can specify:
Style: cinematic realism, stop-motion feel, hand-drawn animation, film noir, retro VHS, photorealistic, cinematic, animation, art movements/artists, specific looks
Lighting: soft key light, warm tungsten practicals, cool moonlight rim light, natural light , artificial light, cinematic lighting ("rembrandt lighting on a portrait"), specific effects ("volumetric lighting creating visible light rays")
Tone or Mood: happy/joyful, sad/melancholy, suspenseful/tense, peaceful/serene, epic/grandios, futuristic/sci-fi:, vintage/retro, romantic, horror
Ambiance: color paelttes, atmospheric effects, textural qualities
For an indepth guide of video generation elements, look into Veo Prompt Guide
3) Be Specific and Concrete
Replace vague language with visual direction.
Specific prompts usually produce more controllable outputs because the model has less room to guess. Google’s prompt guidance also emphasizes using detailed, explicit instructions rather than abstract descriptors.
Weaker:
Better:
4) Keep the Prompt Organized
Write in short, readable lines or short grouped sections.
A clean structure often works better than one long paragraph. For example:
Format 16:9, 6 seconds, cinematic realism.
Subject + Action A matte-black skincare bottle rotates slowly on a wet stone slab.
Cinematography Locked-off tripod. 50mm lens. Shallow depth of field.
Style & Ambiance Soft cool backlight, subtle bloom, low-contrast filmic grade.
This kind of structure reduces ambiguity and makes it easier to revise later.
5) Define Constraints Early
Put core technical constraints near the top of the prompt:
aspect ratio
clip length
frame style or format
resolution if supported
general visual mode
Example:
6) Write Negative Prompts the Right Way
When excluding elements, avoid awkward phrasing like:
“don’t show buildings”
“no scary mood”
“don’t make it dark”
Instead, phrase the exclusion clearly as a negative prompt list.
Example:
7) Think in Story Beats
Even short clips work better when they have internal structure.
For a 6-8 second clip, you can think in simple beats:
Beat 1: establish subject and setting
Beat 2: introduce motion or action
Beat 3: end on a reveal, expression, or hold
8) Use Reference Images for Consistency
Use reference inputs when you want:
the same character across multiple shots
the same product design across variations
the same environment or art direction
more visual consistency from scene to scene
9) Veo 3.1 Prompting Guide
Tips
Native Audio Prefixes: Veo 3.1 is one of the few models that allows for direct audio-to-video synchronization within the prompt using specific tags:
300-Character Ceiling: Veo 3.1 is highly sensitive to prompt length. Aim for 150–300 characters. Prompts exceeding 400 characters often lead to "prompt leakage," where the model ignores the latter half of your instructions.
Timestamp-Based Prompting: You can direct specific pacing by adding time markers.
Multi-Reference "Ingredients": Instead of just one "Image-to-Video" reference, Veo 3.1 supports up to three image inputs. Use this to maintain consistency for a specific product, a specific character, and a specific background simultaneously
Audio Prompting
For Veo 3.1, audio should be treated as one of the main parts of the prompt when relevant. Google’s latest Veo guide explicitly highlights synchronized audio, including dialogue, ambient sound, and sound effects.
Useful audio elements include:
dialogue
ambient room tone
environmental sounds
sound effects
music or no music
voice tone, pace, style, or accent
A good rule is to describe audio in separate lines.
No Dialogue Example
With Dialogue Example
Timestamp Prompting
Aside from story beats, Veo 3.1 now goes further by supporting timestamp-based prompting for more explicit scene pacing. Google’s latest guidance includes timestamped shot descriptions as a way to control the sequence of events in a short clip.
Example:
Video Generation Workflows
Ingredients to Video
Prepare visual ingredients, such as characters, props, or settings, then animate them into a video scene. Google specifically highlights this for building multi-shot scenes with consistent characters and dialogue.
How does it work:
Generate your "ingredients": reference images
Compose the scene: Use the Ingredients to Video feature with the relevant reference images.
Use this when:
you need repeatable character design
you want a product to remain consistent
you want a scene to feel like the same world across shots
you are building a dialogue scene with recurring subjects
First and Last Frame
You provide a starting image and an ending image, then prompt the model to generate the transition between them. Google’s Veo 3.1 guide explicitly describes this feature and recommends describing both the motion and the audio for the in-between sequence.
Use this when you want:
a controlled transformation between two states
a reveal from one composition to another
a precise transition arc
a stylized before/after movement
Example
For an in depth guide of video generation workflows, look into Veo Prompt Guide
Prompt Templates
General Video Generation Template
Product Here with Native Audio
Character with Dialogue
First-Frame / Image-to-Video Variation
Last updated

