Video Generation Prompt Guide

Table of Contents


1. Quickstart

Basic Prompting Sandwich

A simple way to structure a strong video prompt is:

[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]

  1. Cinematography: Define the camera work and shot composition.

  2. Subject: Identify the main character or focal point.

  3. Action: Describe what the subject is doing.

  4. Context: Detail the environment and background elements.

  5. Style & ambiance: Specify the overall aesthetic, mood, and lighting

Example

Medium shot, a tired corporate worker, rubbing his temples in exhaustion, in front of a bulky 1980s computer in a cluttered office late at night. The scene is lit by the harsh fluorescent overhead lights and the green glow of the monochrome monitor. Retro aesthetic, shot as if on 1980s color film, slightly grainy.


2) Break the Prompt Into Clear Elements

Cinematography

Describe how the scene should be filmed.

You can specify:

  • Shot Type: wide establishing, medium, close-up, extreme close-up, macro, over-the-shoulder, long shot, POV

  • Camera Angle: eye-level, low angle, high angle, top-down, worm's-eye, dutch angle,

  • Camera Movement: slow dolly-in, pan, tilt, locked-off tripod, handheld tracking, orbit, crane shot, static shot (or fixed), truck, pedestal, zoom, drone, whip pan, arc shot

  • Lens & Focus: 35mm lens, shallow depth of field, soft bokeh, anamorphic flares, wide angle, telephoto, deep depth of field, lens flare, rack focus, fisheye, vertigo

Example:

Subject

State the main focus of the shot. Be specific. Include defining details like color, material, age, species, clothing, or expression where helpful.

Examples:

  • People: Generic descriptors (man, woman, elderly person), Specific professions, Historical figures, Mythical beings ("mischievous fairy", "a stoic knight")

  • Animals or creatures: Specific breeds of animals, Fantastical creatures: ("a miniature dragon with iridescent scales", "a wise, ancient talking tree")

  • Objects: Everyday items, Vehicles, Abstract shapes ("glowing orbs", "crystalline structures")

Action

Describe what the subject is doing. The clearer the action, the more predictable the animation.

You can specify:

  • Basic movements: walking, running, jumping, flying, swimming, dancing, spinning, falling, standing still, sitting

  • Interactions: talking, laughing, arguing, hugging, fighting, playing a game, cooking, building, writing, reading, observing

  • Emotional expressions: smiling, frowning, surprise, concentrating deeply, appearing thoughtful, showing excitement, crying

  • Subtle actions: a gentle breeze ruffling hair, leaves rustling, a subtle nod, fingers tapping impatiently, eyes blinking slowly

  • Transformations or processes: a flower blooming in fast-motion, ice melting, a city skyline developing over time (however, keep clip length in mind for events that occur over a longer period)

Context

Describe the setting or situation around the subject. Context grounds the scene and prevents generic outputs.

You can specify:

  • Location (interior): a cozy living room with a crackling fireplace, a sterile futuristic laboratory, a cluttered artist's studio, a grand ballroom, a dusty attic

  • Location (exterior): a sun-drenched tropical beach, a misty ancient forest, a bustling futuristic cityscape at night, a serene mountain peak at dawn, a desolate alien planet

  • Time of day: golden hour, midday sun, twilight, deep night, pre-dawn

  • Weather: clear blue sky, overcast and gloomy, light drizzle, heavy thunderstorm with visible lightning, gentle snowfall, swirling fog

  • Historical or fantastical period: a medieval castle courtyard, a roaring 1920s jazz club, a cyberpunk alleyway, an enchanted forest glade

  • Atmospheric details: floating dust motes in a sunbeam, shimmering heat haze, reflections on wet pavement, leaves scattered by the wind

Style & Ambiance

Describe the look and mood of the video. Instead of writing something broad like “epic vibe,” define the mood through lighting, palette, and texture.

You can specify:

  • Style: cinematic realism, stop-motion feel, hand-drawn animation, film noir, retro VHS, photorealistic, cinematic, animation, art movements/artists, specific looks

  • Lighting: soft key light, warm tungsten practicals, cool moonlight rim light, natural light , artificial light, cinematic lighting ("rembrandt lighting on a portrait"), specific effects ("volumetric lighting creating visible light rays")

  • Tone or Mood: happy/joyful, sad/melancholy, suspenseful/tense, peaceful/serene, epic/grandios, futuristic/sci-fi:, vintage/retro, romantic, horror

  • Ambiance: color paelttes, atmospheric effects, textural qualities

circle-info

For an indepth guide of video generation elements, look into Veo Prompt Guidearrow-up-right


3) Be Specific and Concrete

Replace vague language with visual direction.

Specific prompts usually produce more controllable outputs because the model has less room to guess. Google’s prompt guidance also emphasizes using detailed, explicit instructions rather than abstract descriptors.

Weaker:

Better:


4) Keep the Prompt Organized

Write in short, readable lines or short grouped sections.

A clean structure often works better than one long paragraph. For example:

Format 16:9, 6 seconds, cinematic realism.

Subject + Action A matte-black skincare bottle rotates slowly on a wet stone slab.

Cinematography Locked-off tripod. 50mm lens. Shallow depth of field.

Style & Ambiance Soft cool backlight, subtle bloom, low-contrast filmic grade.

This kind of structure reduces ambiguity and makes it easier to revise later.


5) Define Constraints Early

Put core technical constraints near the top of the prompt:

  • aspect ratio

  • clip length

  • frame style or format

  • resolution if supported

  • general visual mode

Example:


6) Write Negative Prompts the Right Way

When excluding elements, avoid awkward phrasing like:

  • “don’t show buildings”

  • “no scary mood”

  • “don’t make it dark”

Instead, phrase the exclusion clearly as a negative prompt list.

Example:


7) Think in Story Beats

Even short clips work better when they have internal structure.

For a 6-8 second clip, you can think in simple beats:

  • Beat 1: establish subject and setting

  • Beat 2: introduce motion or action

  • Beat 3: end on a reveal, expression, or hold


8) Use Reference Images for Consistency

Use reference inputs when you want:

  • the same character across multiple shots

  • the same product design across variations

  • the same environment or art direction

  • more visual consistency from scene to scene


9) Veo 3.1 Prompting Guide

Tips

  1. Native Audio Prefixes: Veo 3.1 is one of the few models that allows for direct audio-to-video synchronization within the prompt using specific tags:

  2. 300-Character Ceiling: Veo 3.1 is highly sensitive to prompt length. Aim for 150–300 characters. Prompts exceeding 400 characters often lead to "prompt leakage," where the model ignores the latter half of your instructions.

  3. Timestamp-Based Prompting: You can direct specific pacing by adding time markers.

  4. Multi-Reference "Ingredients": Instead of just one "Image-to-Video" reference, Veo 3.1 supports up to three image inputs. Use this to maintain consistency for a specific product, a specific character, and a specific background simultaneously

Audio Prompting

For Veo 3.1, audio should be treated as one of the main parts of the prompt when relevant. Google’s latest Veo guide explicitly highlights synchronized audio, including dialogue, ambient sound, and sound effects.

Useful audio elements include:

  • dialogue

  • ambient room tone

  • environmental sounds

  • sound effects

  • music or no music

  • voice tone, pace, style, or accent

A good rule is to describe audio in separate lines.

No Dialogue Example

With Dialogue Example

Timestamp Prompting

Aside from story beats, Veo 3.1 now goes further by supporting timestamp-based prompting for more explicit scene pacing. Google’s latest guidance includes timestamped shot descriptions as a way to control the sequence of events in a short clip.

Example:

Video Generation Workflows

Ingredients to Video

Prepare visual ingredients, such as characters, props, or settings, then animate them into a video scene. Google specifically highlights this for building multi-shot scenes with consistent characters and dialogue.

How does it work:

  1. Generate your "ingredients": reference images

  2. Compose the scene: Use the Ingredients to Video feature with the relevant reference images.

Use this when:

  • you need repeatable character design

  • you want a product to remain consistent

  • you want a scene to feel like the same world across shots

  • you are building a dialogue scene with recurring subjects

First and Last Frame

You provide a starting image and an ending image, then prompt the model to generate the transition between them. Google’s Veo 3.1 guide explicitly describes this feature and recommends describing both the motion and the audio for the in-between sequence.

Use this when you want:

  • a controlled transformation between two states

  • a reveal from one composition to another

  • a precise transition arc

  • a stylized before/after movement

Example

circle-info

For an in depth guide of video generation workflows, look into Veo Prompt Guidearrow-up-right

Prompt Templates

  1. General Video Generation Template

  1. Product Here with Native Audio

  1. Character with Dialogue

  1. First-Frame / Image-to-Video Variation

Last updated