Back to blog
Video7 min read

JSON Video Prompt Templates for Veo 3 (Production-Ready, 2026)

Production-ready JSON prompt templates for Veo 3. Character-consistent multi-shot, dialogue scenes, ads, B-roll. Copy-paste with placeholders.

NH
Nafiul Hasan
Founder, Prompt Architects

title: "JSON Video Prompt Templates for Veo 3 (Production-Ready, 2026)" slug: "26-json-video-prompt-templates-veo3" description: "Production-ready JSON prompt templates for Veo 3. Character-consistent multi-shot, dialogue scenes, ads, B-roll. Copy-paste with placeholders." publishedAt: "2026-08-07" updatedAt: "2026-08-07" postNum: 26 pillar: 3 targetKeyword: "json video prompt veo 3" keywords:

  • "json video prompt"
  • "veo 3 json template"
  • "structured video prompt"
  • "multi-shot ai video" ogImage: "https://prompt-architects.com/og/26-json-video-prompt-templates-veo3.png" author: name: "Nafiul Hasan" role: "Founder, Prompt Architects" url: "https://prompt-architects.com/about" ctaFeature: "video" related: [21, 22, 24] faq:
  • q: "Why use JSON prompts instead of natural language for Veo 3?" a: "Three reasons. (1) Character consistency across shots — define the subject once, reference everywhere. (2) Reproducibility — the same JSON produces predictable variants. (3) Templating at scale — swap variables in production pipelines without rewriting prose. For one-off cinematic shots, natural language is fine. For series, ads, or multi-shot narratives, JSON wins."
  • q: "Does Veo 3 actually parse JSON syntax?" a: "Yes — Veo 3 (April 2026) reliably parses structured JSON in the prompt field. The model treats keys as semantic anchors. Use plain JSON without code fences in the prompt input. Some users wrap in markdown code fences; it works but isn't required."
  • q: "What fields matter most in a Veo 3 JSON prompt?" a: "Subject (with explicit physical descriptors and wardrobe), camera (framing + lens + movement), lighting (source + direction + mood), audio (dialogue + ambience + score), and scene (location + time + weather). Action and motion are also critical. Omit any field at your peril — Veo 3 fills gaps with default house aesthetic."
  • q: "How do I keep characters consistent across 5+ shots?" a: "Define the character object once with all physical and wardrobe descriptors. Reference by name in each shot's subject field with full re-description (Veo 3 doesn't carry context across prompts). Add a 'distinguishing_features' field with 2-3 unique attributes. Use --seed for face-locking variants."
  • q: "Should I include audio in every shot?" a: "Yes for Veo 3 — audio is its differentiator. Even ambient-only shots benefit from explicit ambience and score guidance. Skipping audio cues forfeits the model's edge over Sora and Kling. Bare minimum: ambience + score mood, even on dialogue-free clips."

TL;DR: Production JSON templates for Veo 3 — character-consistent multi-shot narratives, dialogue scenes, ads, B-roll. Copy-paste with placeholders. Tested patterns.

Why JSON for Veo 3

Veo 3's 6-part structure (Subject, Action, Scene, Camera, Lighting, Audio) maps cleanly to JSON keys. Three operational wins:

  1. Reuse: One character object referenced across 10 shots.
  2. Variables: Swap {{location}}, {{time_of_day}} for variant generation.
  3. Audit: Diff JSON like code; track what changed when output regressed.

Template 1: Hero shot (single 8-second clip)

{
  "subject": {
    "description": "A 32-year-old woman with curly auburn hair, freckles, wearing a cream linen blazer over a white t-shirt and dark jeans",
    "distinguishing_features": "small silver pendant necklace, slight nose ring"
  },
  "action": "walks slowly toward camera, pauses, looks directly into lens with a confident half-smile",
  "scene": {
    "location": "modernist concrete-and-glass office lobby",
    "time": "late afternoon, golden hour",
    "weather": "clear, soft warm light streaming through floor-to-ceiling windows"
  },
  "camera": {
    "framing": "medium close-up, eye-level",
    "lens": "50mm prime",
    "movement": "slow dolly in, ending tight on face"
  },
  "lighting": "warm golden hour rim light from camera-right, soft fill from camera-left, subtle catchlight in eyes",
  "audio": {
    "dialogue": "none",
    "ambience": "soft city ambience, distant footsteps echoing on marble floor",
    "score": "subtle uplifting orchestral swell, building to a held note"
  },
  "duration_seconds": 8,
  "aspect_ratio": "16:9"
}

Use case: Brand hero shot, founder spotlight, product launch lead-in.

Template 2: Character-consistent multi-shot

For sequences spanning 5+ shots, define character once, reuse:

{
  "character_lock": {
    "name": "MAYA",
    "physical": "32, curly auburn hair shoulder-length, freckles, green eyes, 5'7\"",
    "wardrobe": "cream linen blazer, white t-shirt, dark jeans, white sneakers",
    "distinguishing_features": "small silver pendant necklace, slight nose ring, gestures with hands when speaking",
    "voice": "warm mid-range, slight rasp, speaks at measured pace"
  }
}

Then per-shot:

{
  "shot_id": "shot_03",
  "subject": "MAYA (re-describe full character_lock here verbatim — Veo 3 doesn't carry context)",
  "action": "sits at a wooden desk with laptop open, types a few words then leans back thinking, taps pen against chin",
  "scene": {
    "location": "minimalist home office, warm wood desk, single Eames chair",
    "time": "mid-morning",
    "weather": "soft overcast light through window"
  },
  "camera": {
    "framing": "wide shot, eye-level",
    "lens": "35mm",
    "movement": "static"
  },
  "lighting": "soft daylight from camera-left window, warm practical lamp on desk",
  "audio": {
    "dialogue": "none",
    "ambience": "soft keyboard typing, distant birds, clock ticking",
    "score": "minimal piano, contemplative mood"
  },
  "duration_seconds": 6,
  "aspect_ratio": "16:9",
  "seed": 42
}

Critical: Re-describe the full character in every shot's subject field. Veo 3 generates each prompt independently — context doesn't carry. Use --seed consistent across shots for face-locking.

Template 3: Dialogue scene

{
  "subject": {
    "description": "Two friends at a coffee shop. PERSON_A: 28, short dark hair, denim jacket. PERSON_B: 30, shaved head, navy hoodie",
    "distinguishing_features": "PERSON_A holds a latte, PERSON_B has a notebook open"
  },
  "action": "PERSON_A leans in and says (excitedly): 'Wait — that actually worked?' PERSON_B (laughing): 'Yeah, on the third try.' Both laugh, raise coffee cups in a small toast",
  "scene": {
    "location": "warm independent coffee shop, exposed brick, hanging plants",
    "time": "weekday morning",
    "weather": "soft natural light from large window"
  },
  "camera": {
    "framing": "two-shot medium, eye-level, slight over-the-shoulder bias toward PERSON_A",
    "lens": "35mm",
    "movement": "subtle handheld, organic slight sway"
  },
  "lighting": "natural window light from camera-right, warm amber bounce from interior",
  "audio": {
    "dialogue": "PERSON_A (excited): 'Wait — that actually worked?' PERSON_B (laughing): 'Yeah, on the third try.'",
    "ambience": "espresso machine hiss, distant chatter, soft jazz playing",
    "score": "none, naturalistic"
  },
  "duration_seconds": 8,
  "aspect_ratio": "16:9"
}

Why JSON for dialogue: Speaker attribution is unambiguous. Voice direction (excited, laughing) syncs with Veo 3's audio generation.

Template 4: 30-second ad (4 stitched clips)

{
  "campaign": "Spring Skincare Launch",
  "shots": [
    {
      "shot_id": "01_hook",
      "duration_seconds": 6,
      "subject": "MAYA (curly auburn hair, freckles, 32) holding a glass skincare bottle to morning light",
      "action": "turns bottle slowly, light catches the liquid, soft smile breaks",
      "scene": "minimalist bathroom, white tile, soft morning light from window",
      "camera": "medium close-up, slow push in, 50mm",
      "lighting": "soft window light from camera-left, golden warm",
      "audio": {
        "dialogue": "MAYA (V.O., warm): 'It started with a question.'",
        "ambience": "morning birds, soft water trickling",
        "score": "gentle piano building"
      }
    },
    {
      "shot_id": "02_problem",
      "duration_seconds": 6,
      "subject": "MAYA (same character) at vanity mirror, examining her face in disappointed micro-expression",
      "action": "leans in close to mirror, sighs, drops shoulders",
      "scene": "same bathroom, slightly different angle, mirror dominant",
      "camera": "medium, mirror reflection, 35mm, static",
      "lighting": "honest natural light, no flattering tricks",
      "audio": {
        "dialogue": "MAYA (V.O.): 'Why does my skin react to everything?'",
        "ambience": "muted, tense quiet",
        "score": "piano pauses, single held note"
      }
    },
    {
      "shot_id": "03_solution",
      "duration_seconds": 8,
      "subject": "Bottle with brand label rotating on white surface, ingredient overlay text appearing",
      "action": "bottle rotates 180°, label fully readable, ingredient names fade in over white",
      "scene": "studio white seamless backdrop",
      "camera": "macro lens, rotating subject, continuous",
      "lighting": "even soft studio light, no shadows",
      "audio": {
        "dialogue": "MAYA (V.O., relieved): 'Three ingredients. Nothing else.'",
        "ambience": "studio quiet",
        "score": "uplifting orchestral swell, building"
      }
    },
    {
      "shot_id": "04_resolution",
      "duration_seconds": 6,
      "subject": "MAYA same outfit, smiling genuinely now, applying product",
      "action": "applies product to cheek, smiles into mirror, satisfied",
      "scene": "same bathroom, golden hour now, warm and bright",
      "camera": "medium close-up, slow pull back, 50mm",
      "lighting": "warm golden hour light, hopeful",
      "audio": {
        "dialogue": "MAYA (V.O.): 'Finally. Skincare that listens back.'",
        "ambience": "morning warmth, soft ambient",
        "score": "score resolves to held warm chord, brand sting"
      }
    }
  ],
  "post_production": {
    "stitch": "edit shots in order, dissolve 0.5s between each",
    "color_grade": "warm golden, slightly lifted shadows, brand-aligned palette",
    "end_card": "logo + brand URL, 2 seconds"
  }
}

Template 5: B-roll texture pack

{
  "shot_id": "broll_01",
  "subject": "abstract liquid pour macro",
  "action": "thick honey-colored liquid pours slowly into a clear glass vessel, ripples expanding",
  "scene": "studio, white seamless backdrop",
  "camera": {
    "framing": "extreme macro",
    "lens": "100mm macro",
    "movement": "static, locked"
  },
  "lighting": "soft top light, slight side rake to reveal viscosity",
  "audio": {
    "dialogue": "none",
    "ambience": "subtle pour gurgle",
    "score": "none"
  },
  "duration_seconds": 6,
  "aspect_ratio": "16:9"
}

Generate 10 of these with variations: {{liquid_color}}, {{vessel_type}}, {{lighting_angle}}. Build a B-roll library.

Template 6: Stylized aesthetic

{
  "subject": "anime-style young woman with long pink hair, large green eyes, wearing white school uniform with red bow",
  "action": "stands on cliff edge, wind blowing her hair, looks toward distant city below, single tear rolls down cheek",
  "scene": {
    "location": "cliff overlooking neon-lit cyberpunk city at night",
    "time": "midnight",
    "weather": "light rain, atmospheric mist"
  },
  "camera": {
    "framing": "medium wide, eye-level, slight tilt up",
    "lens": "anime aesthetic, soft focus background",
    "movement": "slow camera pull back"
  },
  "lighting": "neon city glow from below in pink and cyan, moonlight rim from above",
  "audio": {
    "dialogue": "none",
    "ambience": "rain on cliff, distant city hum, wind",
    "score": "melancholic synthwave, slow tempo, emotional"
  },
  "style_anchor": "anime, illustrated 2D, Studio Ghibli meets cyberpunk",
  "duration_seconds": 8,
  "aspect_ratio": "16:9"
}

Note: Veo 3 does stylized work but has photoreal bias. For anime-heavy projects, Kling produces tighter results.

Variable injection patterns

For production pipelines, use placeholder syntax in templates:

{
  "subject": "{{character_name}}, {{age}}, {{hair_description}}, wearing {{wardrobe}}",
  "action": "{{primary_action}}",
  "scene": {
    "location": "{{location}}",
    "time": "{{time_of_day}}"
  },
  "camera": "{{camera_directive}}",
  "audio": {
    "dialogue": "{{character_name}} (V.O., {{tone}}): '{{line}}'"
  }
}

Inject variables at runtime. Build template once; generate 50 variants.

Common mistakes

  1. Missing audio. Half of Veo 3's value. Specify even when "ambience only."
  2. Skipping character re-description in multi-shot. Veo 3 doesn't carry context. Re-describe every shot.
  3. Vague camera. "Camera moves" produces random results. Specify framing + lens + movement.
  4. No lighting direction. "Soft lighting" is generic. State source direction + temperature.
  5. Trying 30+ second narratives in one prompt. Doesn't work. Stitch shorter clips.
  6. Markdown code fences in the prompt input. Sometimes Veo parses them as content. Use plain JSON.

Production tips

  • Version templates. template_v3.json. Diff when output regresses.
  • Lock seeds for character series. --seed 42 across all shots.
  • Render reference frames first. Generate 1-second test before committing to 8-second renders.
  • Build a JSON template library. 20 reliable templates beat improvising every shoot.

What to do next

  1. Pick one template above matching your most common need.
  2. Customize with your character + scene.
  3. Render 3 variants. Pick the best.
  4. Save as your starter template. Iterate from there.
  5. Build a 10-template library over the next month.

Tools that ship JSON video prompt templates as one-click presets (Prompt Architects) save the structure-typing for repeated work. Templates above transfer directly.

Frequently asked questions

Why use JSON prompts instead of natural language for Veo 3?
Three reasons. (1) Character consistency across shots — define the subject once, reference everywhere. (2) Reproducibility — the same JSON produces predictable variants. (3) Templating at scale — swap variables in production pipelines without rewriting prose. For one-off cinematic shots, natural language is fine. For series, ads, or multi-shot narratives, JSON wins.
Does Veo 3 actually parse JSON syntax?
Yes — Veo 3 (April 2026) reliably parses structured JSON in the prompt field. The model treats keys as semantic anchors. Use plain JSON without code fences in the prompt input. Some users wrap in markdown code fences; it works but isn't required.
What fields matter most in a Veo 3 JSON prompt?
Subject (with explicit physical descriptors and wardrobe), camera (framing + lens + movement), lighting (source + direction + mood), audio (dialogue + ambience + score), and scene (location + time + weather). Action and motion are also critical. Omit any field at your peril — Veo 3 fills gaps with default house aesthetic.
How do I keep characters consistent across 5+ shots?
Define the character object once with all physical and wardrobe descriptors. Reference by name in each shot's subject field with full re-description (Veo 3 doesn't carry context across prompts). Add a 'distinguishing_features' field with 2-3 unique attributes. Use --seed for face-locking variants.
Should I include audio in every shot?
Yes for Veo 3 — audio is its differentiator. Even ambient-only shots benefit from explicit ambience and score guidance. Skipping audio cues forfeits the model's edge over Sora and Kling. Bare minimum: ambience + score mood, even on dialogue-free clips.
Free Chrome Extension

Stop rewriting prompts. Start shipping.

Works with ChatGPT, Claude, Gemini, Grok, Midjourney, Ideogram, Veo3 & Kling. 5.0★ on the Chrome Web Store.

Add to Chrome — Free