Back to blog
VideoUpdated June 10, 202620 min read

30 Cinematic Camera Prompts for Veo 3 and Kling AI (2026)

30 tested camera prompts for Veo 3 and Kling AI video generation. Framing, movement, lens, mood — copy-paste ready, with shot-type breakdown.

NH
Nafiul Hasan
Founder, Prompt Architects

TL;DR: These 30 copy-paste camera prompts for Veo 3 and Kling AI are sorted by shot type, each with the exact camera-modifier breakdown so you can swap in your own subject. Veo 3 and Kling both parse real cinematography vocabulary — dolly, tracking, crane, Dutch tilt, 35mm — and the sweet spot is two to three modifiers per shot, with one dominant camera movement.

What are the best cinematic camera prompts for Veo 3 and Kling AI?

The best cinematic camera prompts for Veo 3 and Kling AI name one shot type, one camera movement, and one lens — for example "medium close-up, slow dolly in, 35mm lens." Both models were trained on professional film footage and parse real cinematography terms, so naming a "dolly shot" instead of "camera moves forward" gives you the emotion and pacing of that move. Keep it to two or three modifiers per clip.

That single rule — name the real technique, then stop at three modifiers — is what separates a usable shot from a muddy one. The rest of this guide gives you 30 ready-made camera prompts built on that rule, organized by the role each shot plays in a sequence, plus the structural tricks that make Veo 3 and Kling actually obey you.

Why does cinematography vocabulary work in AI video models?

Veo 3 and Kling did not learn "camera language" as abstract instructions. They learned it by watching millions of professionally shot clips that were captioned with industry terms. So when you write "crane shot" or "shallow depth of field," you are activating a dense cluster of learned associations: the typical pacing of that move, the lens that usually shoots it, the lighting it pairs with, and the emotion it carries.

Google's own engineering guidance makes this explicit. Their ultimate prompting guide for Veo 3.1 recommends specific cinematographic language — dolly shot, tracking shot, crane shot, aerial view, slow pan, POV shot, 180-degree arc — because the model maps each term to a concrete visual and emotional pattern. Vague phrasing like "make it look cool" gives the model nothing to anchor to.

The same is true for Kling. Its official camera-movement guidance lists six core motions — pan, tilt, zoom, tracking/dolly, roll, and pedestal — and recommends a prompt order that ends with the camera instruction so the model builds the scene first, then moves through it. Different model, same principle: real terms beat vague description.

The headline number: In Prompt Architects' internal test of 200 prompts across Veo 3 and Kling, the optimal count was 2-3 camera modifiers per shot. Beyond three, output averages between conflicting instructions and gets muddy. This matches Google's published advice to choose one primary camera move per clip.

If you want the deeper mechanics of why structured prompts outperform casual ones across every model, see our breakdown of why structured prompts beat plain prompts.

How should you structure a Veo 3 or Kling prompt?

Both models reward a clear, ordered prompt — but the recommended order differs slightly, and that difference matters.

Google's published structure for Veo 3.1 is a five-part formula:

[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]

Cinematography leads. You define the camera work and shot composition first, then the subject, what it does, the environment, and finally the aesthetic and lighting. Veo 3.1 also adds native audio, so in practice you often append a sixth element — dialogue, sound effects, or ambient noise — using the conventions Google documents.

Kling's official formula puts the camera last:

Subject + Environment + Action + Lighting + Style + Camera Movement

The reasoning, per Kling's guidance, is that putting the camera instruction at the end "ensures the AI builds the scene first before trying to move through it." In testing, both orders work, but if you are getting a Kling output where the camera move overwhelms a half-built scene, move your camera clause to the end.

Here is the same shot written for both models so you can see the structural difference:

Veo 3 (camera-first):

Slow dolly push from a medium shot to a close-up, 35mm lens.
A weathered fisherman mends a net on a wooden dock.
Cold morning light, low fog over grey water, muted teal palette.
SFX: gulls, distant waves, the creak of rope.

Kling (camera-last):

A weathered fisherman mends a net on a wooden dock.
Cold morning, low fog over grey water, muted teal palette, realistic film style.
Camera: slow dolly push from medium shot to close-up.
ElementVeo 3 orderKling order
Camera / cinematographyFirstLast
SubjectSecondFirst
ActionThirdThird
Context / environmentFourthSecond
Style & lightingFifthFourth–fifth
Native audioSupported (dialogue, SFX, ambient)Not the model's focus

For the full templating system — including reusable Global Variables you can drop into either model — see our Veo 3 prompt structure guide.

How to read these 30 prompts

Each entry below gives you three things:

  • Shot type — the role this shot plays in a sequence (establishing, close-up, kinetic, action, mood, specialty).
  • Camera language — the specific modifiers that produce the look.
  • Full prompt — drop it into Veo 3 or Kling and replace the subject placeholders.

The prompts focus on the camera section. Pair each with your own subject, scene, lighting, and (for Veo 3) audio. All 30 stay within the two-to-three-modifier rule, so they generate cleanly in both models with minor adjustment.

Establishing shots — 5 camera prompts

Establishing shots orient the viewer. They answer where are we? before the story narrows in. These five give you scale, geography, and mood in a single move.

1. Wide environment reveal

Camera: Wide static establishing shot, 24mm lens, eye-level, no movement.
Holds for full duration. Subject occupies bottom third of frame, environment dominates.

2. Slow drone descent

Camera: Aerial wide shot, slow vertical descent from 200ft to 50ft, smooth gimbal motion.
28mm lens. Reveals subject as the camera lowers.

3. Crane up with reveal

Camera: Starts at ground level on the subject's feet. Slow vertical crane up to eye level.
35mm lens. Pace: roughly 1ft per second rise.

4. Long-lens compression establishing

Camera: Long lens (135mm), telephoto compression. Static. Subject in the middle distance,
foreground and background equally compressed.

5. Push-in establishing

Camera: Slow dolly push from wide to medium shot over the full duration. 35mm.
Subject framed center. Environment narrows as the camera approaches.

Why these work: Establishing shots use wide lenses (24-28mm) for breadth or long lenses (135mm) for compression. Movement, if any, is slow and singular — a descent, a crane, a push. Aerial and crane moves are explicitly in Google's recommended movement list, so Veo 3 renders them reliably.

Character close-ups — 5 camera prompts

Close-ups carry emotion. The lens choice here is doing most of the work: longer lenses flatten and flatter the face, shorter ones add intimacy and slight distortion.

6. Standard medium close-up

Camera: Medium close-up, 35mm lens, eye-level, slight rack focus onto the subject's face.
Static framing.

7. Extreme close-up emotion

Camera: Extreme close-up on the eyes, 85mm lens, shallow depth of field (f/1.4 feel).
Slight handheld tremor.

8. Over-the-shoulder dialogue

Camera: Over-the-shoulder framing, 50mm lens, focus on the receiving subject.
Slight angle so both faces are partially visible. Static.

9. Profile silhouette

Camera: Side-profile medium shot, 50mm lens, subject backlit so the silhouette dominates.
Static.

10. Slow orbit close-up

Camera: Medium close-up, 35mm lens, slow circular orbit (clockwise, front to side, 90°).
Subject stays centered.

Why these work: The 85mm "shallow depth of field" pairing in prompt #7 maps directly onto Google's documented lens and focus options — shallow depth of field, macro lens, soft focus — which the model treats as distinct, learnable looks. The orbit in #10 is a single dominant move, well within the modifier budget.

Movement and kinetic shots — 5 camera prompts

Kinetic shots follow a moving subject. This is where Kling often shines: its motion-control system was built to track subjects through space, and explicit movement verbs give it a clean path to follow.

11. Side tracking walk

Camera: Medium tracking shot from the subject's right side.
Camera moves at the subject's walking speed. 35mm lens, slight handheld feel.

12. Behind-the-back follow

Camera: Medium shot from behind the subject, 35mm lens, gimbal-smooth, follows at 3ft distance.
Subject's head and shoulders fill the frame.

13. Low-angle hero walk

Camera: Low angle (3ft above ground), 24mm wide lens, tracking forward as the subject walks
toward camera. Subject grows in frame.

14. Whip-pan transition

Camera: Static medium shot for 1 second, then a fast horizontal whip pan (180° in 0.3s)
revealing a new subject. 35mm lens.

15. Steadicam first-person

Camera: First-person POV, 24mm lens, gimbal-smooth, walking pace forward through the environment.
Subtle vertical bob mimicking footsteps.

Why these work: Tracking shots are in both models' core vocabularies. For Kling specifically, name the motion explicitly and keep it simple — its guidance notes that plain directional language beats ornate cinematic phrasing for stable motion. The POV in #15 maps to Google's listed "POV shot."

Action and dynamic shots — 5 camera prompts

Action shots add energy through speed and instability. The trick is to specify the kind of energy: a controlled orbit reads very differently from deliberate handheld shake.

16. Handheld run

Camera: Handheld medium shot from the front, 35mm lens, follows the running subject backward.
Significant camera shake matching footfall rhythm.

17. Slow-motion impact

Camera: Medium shot, 50mm lens, locked-off static. 60fps slow motion (renders as ~0.4× speed).
Subject performs a single action.

18. 360° circular orbit

Camera: Medium shot, 35mm lens, smooth gimbal orbit, full 360° around the subject.
Pace: one complete revolution over the duration. Subject stays centered.

19. Crash zoom

Camera: Wide static shot for 1 second, then a rapid zoom to medium close-up over 0.5 seconds.
24mm to 85mm equivalent.

20. Top-down chase

Camera: Top-down aerial, 35mm lens, follows the subject moving across the environment.
Camera maintains ~50ft altitude. Subject occupies the center-bottom third.

Why these work: The 180-degree arc and orbit moves echo Google's documented "180-degree arc shot." For the crash zoom in #19 and whip pan in #14, the timing notation (over 0.5 seconds, 180° in 0.3s) gives the model a clear pace, which is exactly the kind of explicit movement path Google's guide recommends.

Mood and atmospheric shots — 5 camera prompts

Mood shots use framing and optics to create feeling rather than information. A Dutch tilt signals unease; an anamorphic flare signals scale and polish. These are where "show the technique, not the word cinematic" pays off most.

21. Dutch tilt unease

Camera: Medium close-up, 35mm lens, 15° Dutch tilt. Static framing.
Subject offset to the right, leaning into the tilt.

22. Dolly zoom (Vertigo effect)

Camera: Medium shot, dolly forward while zooming out. 35mm to 70mm equivalent.
Background distorts while the subject stays the same size.

23. Anamorphic compression

Camera: Wide shot, 2.35:1 anamorphic lens look, blue horizontal lens flare from an off-screen
light source. Static framing.

24. Foreground silhouette frame

Camera: Medium shot, 50mm lens, a sharp silhouetted foreground element (door frame, branches)
framing the subject in the middle distance.

25. Reflection in a surface

Camera: Tight close-up on a reflective surface (window, water, mirror). Subject visible in the
reflection. 50mm lens.

Why these work: These prompts name precise optical techniques — Dutch tilt, dolly zoom, anamorphic flare — instead of the generic "cinematic." That specificity is exactly what gives the model control. The dolly zoom in #22 is two simultaneous moves, which is the upper bound of what a single short clip handles cleanly; if it muddies, split it.

Specialty shots — 5 camera prompts

Specialty shots are the finishing details: inserts, textures, and tricks that make a sequence feel authored rather than generated.

26. Macro detail insert

Camera: Extreme close-up, macro lens, shallow depth of field. Static.
Subject (object detail, hand, texture) fills the frame.

27. Time-lapse interior

Camera: Wide static interior, 24mm lens. Time-lapse mode (renders as ~4× speed).
Subject moves through the environment naturally.

28. Split diopter

Camera: Medium shot, 50mm lens, split-diopter effect. Foreground subject and background subject
both in sharp focus simultaneously.

29. Through-the-glass shot

Camera: Medium shot through a window or glass surface. 50mm lens.
Subject visible through a partially reflective surface, slight haze on the glass.

30. Tilt-shift plane of focus

Camera: Medium close-up, 35mm lens, tilted plane of focus. Subject sharp in a narrow band,
the rest of the frame falling into soft blur.

Why these work: The macro insert in #26 uses Google's documented "macro lens" option. Split diopter (#28) and tilt-shift (#30) are advanced optical effects — they don't always land on the first generation, so treat them as candidates to regenerate two or three times rather than guaranteed one-shots.

Which proven modifier combinations should you use?

After hundreds of generations, certain pairings produce a reliable look every time. Use this table as a quick lookup when you know the feeling you want but not the exact camera spec.

PairingResult
Medium CU + 35mm + handheldDocumentary, intimate
Wide + 24mm + low angleHero shot
Long lens (85mm+) + shallow DoF + backlitEditorial portrait
Steadicam + 24mm + trackingSmooth narrative follow
Top-down + static + symmetricalWes Anderson framing
Dutch tilt + 35mm + low lightTension / unease
Macro + shallow DoF + side-litProduct hero
Aerial wide + slow descent + golden hourLifestyle establish
Dolly zoom + 35–70mm + locked subjectVertigo / dread
Orbit + 35mm + centered subjectReveal / showcase

Each of these stays at three modifiers or fewer — framing, lens, and one move or lighting cue. That is not a coincidence. It is the constraint that keeps output clean. If you want to store these as one-click templates you can reuse across projects, our prompt library and Global Variables workflow is built exactly for that.

How do Veo 3 and Kling differ in practice?

They share a vocabulary, but they have different strengths. Knowing which to reach for saves a lot of wasted credits.

Veo 3.1 generates 4, 6, or 8-second clips at 720p, 1080p, or 4K with synchronized native audio — dialogue, sound effects, and ambient soundscapes generated directly inside the video, per Google DeepMind. It also supports reference images (up to three) for character consistency, first-and-last-frame transitions, and scene extension up to roughly 141 seconds. If your shot needs sound, tight static framing, or dialogue lip-sync, Veo 3 is usually the stronger pick.

Kling leans into motion. Its Professional Mode exposes manual camera sliders — horizontal, vertical, and zoom values, typically kept between 1 and 3 for smooth motion — that give you frame-accurate camera control text alone can't match, according to its camera-movement guide. Newer Kling releases add parallax depth so background elements shift naturally at different speeds based on distance. For dynamic orbits, follows, and tracking moves, Kling frequently wins.

CapabilityVeo 3.1Kling
Native synchronized audioYes (dialogue, SFX, ambient)Not the focus
Manual camera slidersNoYes (Professional Mode)
Max single-clip length8s (extend to ~141s)5s or 10s by mode
Max resolutionUp to 4KHigh-res by tier
Reference imagesUp to 3Yes, varies by version
Best forAudio, dialogue, static framingMotion, orbits, follows

A practical workflow: prototype the camera move in whichever model is cheaper for you, lock the framing and motion you like, then regenerate the keeper in the model whose strength matches the shot. For a side-by-side on choosing the right tool per shot, see our Veo 3 vs Kling comparison.

How do you get multiple shots in one generation?

Veo 3.1 supports timestamp prompting, which lets you script a mini-sequence inside a single 8-second clip. Google's guide gives this exact pattern:

[00:00-00:02] Medium shot from behind the explorer, pushing a vine aside
[00:02-00:04] Reverse shot of the explorer's face, an expression of awe
[00:04-00:06] Tracking shot following the explorer along a stone wall
[00:06-00:08] Wide crane shot revealing the temple complex

This is the single highest-leverage technique in the guide for narrative work. Instead of generating four clips and editing them together, you describe four camera beats with timecodes and let the model cut between them. Each beat still follows the one-dominant-move rule — a reverse shot, then a tracking shot, then a crane — so the modifier budget applies per beat, not per clip.

Kling approaches the same goal through multi-shot generation in its newer models and through stitching clips with consistent reference images. When you need more than eight seconds, plan your sequence as a shot list first, generate each shot with one clear move, and assemble.

What are the most common camera-prompt mistakes?

These five errors account for most "why does my AI video look wrong?" frustration. Fix them and your hit rate jumps immediately.

  1. Mixing framing words. "Wide shot close-up zoom" gives the model three contradictory instructions. Pick one framing per shot.
  2. Skipping the lens. "35mm" and "85mm" produce meaningfully different images. No lens spec means the model guesses, and the guess is rarely what you pictured.
  3. More than three modifiers. This is the big one. Google explicitly warns that requesting pan, tilt, orbit, zoom, and handheld in one short clip produces confused output. Stay tight; one dominant move.
  4. No movement specified. If you don't say "static" or "locked-off," the model adds drift, sway, or a slow push on its own. State it when you want a held frame.
  5. Forgetting subject framing. Camera language alone doesn't position the subject. Add "subject occupies the right third of frame" or similar so the composition is intentional.

A sixth, subtler mistake: overusing the word "cinematic." It produces a generic, over-graded look. Name the real technique — film grain, shallow depth of field, anamorphic flare, golden-hour backlight — and you get control instead of a preset vibe. For more on writing prompts that read as deliberate rather than default, see our AI video prompting mistakes post.

How do you turn these prompts into a repeatable system?

One-off prompts are fine for experiments. For real production you want a system, because the value of these 30 shots compounds when you can recall and adapt them in seconds.

A practical build order:

  • Pick your 5-8 most-used shots from the 30 above — usually a wide establish, a tracking follow, a medium close-up, an orbit, and a static insert.
  • Convert subject details into variables. Replace "the subject" with a placeholder you fill per project, so the camera spec stays fixed and only the content changes.
  • Save both model variants — the Veo camera-first version and the Kling camera-last version — under the same shot name.
  • Tag by emotion as well as shot type. "Tension" should surface the Dutch tilt and dolly zoom; "reveal" should surface the crane and orbit.
  • Log what actually worked. After each generation, note the seed or settings that produced the keeper. Your second pass at any shot should be faster than the first.

This is precisely the loop a tool like Prompt Architects is built to remove friction from — one-click enhancement turns a rough idea into a structured Veo 3 or Kling prompt, the prompt library stores your keepers, and Global Variables swap subjects in and out without retyping the camera spec. But the patterns matter more than any tool. Master the two-to-three-modifier rule, learn which move carries which emotion, and you can write a clean cinematic prompt for Veo 3, Kling, Sora, or Runway from a blank box.

What to do next

Pick three shots from the list. Adapt the subject, scene, lighting, and audio for your project. Generate. Note which combinations produce what you wanted and build your personal preset library from the winners. Then test the same shot across Veo 3 and Kling so you learn each model's tendencies firsthand — Veo for audio and tight framing, Kling for motion.

The vocabulary in these 30 prompts transfers between every modern video model with minor adjustment. The discipline — one shot type, one move, one lens — is what makes any of them work.

Frequently asked questions

What camera modifiers does Veo 3 understand? Veo 3 was trained on professional film footage, so it parses real cinematography vocabulary: wide shot, medium close-up, dolly in, tracking shot, crane shot, aerial view, low angle, Dutch tilt, and lens specs like 35mm or 85mm. Google's official guide recommends choosing one primary camera movement per clip; mixing 2-3 modifiers is the practical sweet spot before output gets muddy.

Does Kling AI parse camera language the same way? Mostly, with differences. Kling responds well to explicit, simple motion terms (push in, pull back, orbit, pan, tilt) and Google-style cinematography vocabulary, but Kling's official formula puts the camera instruction last so the model builds the scene before moving through it. Kling also offers manual camera sliders in Professional Mode for frame-accurate control that text alone can't match.

How many camera modifiers should I include per prompt? Two to three at most. One framing, one movement, and optionally one lens. Google explicitly warns that asking for pan, tilt, orbit, zoom and handheld in one short clip produces confused output. Short AI clips need a clear movement hierarchy with one dominant move.

Do I need the word "cinematic" in every prompt? No. Overusing "cinematic" produces a generic stylized look. Specify the actual technique instead — 35mm film grain, shallow depth of field, anamorphic lens flare, golden-hour backlight. Concrete cinematography terms give the model far more control than the vague label "cinematic."

Will the same camera prompt work in Sora and Runway? Camera vocabulary transfers across modern video models because they were all trained on professional film footage. The structure of subject + action + scene + camera + lighting (+ audio for Veo) works for Veo 3, Kling, Sora and Runway. Audio cues are unique to Veo 3's native audio; the visual camera language is portable.

What is the best prompt structure for Veo 3? Google recommends a five-part formula: Cinematography + Subject + Action + Context + Style & Ambiance. Lead with the camera and shot composition, define the subject and action, set the environment, then specify aesthetic and lighting. For multi-shot sequences in one clip, use timestamp prompting like [00:00-00:02] medium shot, [00:02-00:04] reverse shot.

How long can Veo 3 and Kling clips be? Veo 3.1 generates 4, 6 or 8-second clips at 720p, 1080p or 4K, and supports scene extension up to roughly 141 seconds. Kling generates 5 or 10-second clips depending on model and mode. Plan one dominant camera move per generated clip, then stitch clips for longer sequences.

Why does my AI video add motion I didn't ask for? If you don't specify camera behavior, the model fills the gap with subtle drift, sway, or a slow push-in by default. To lock a frame, explicitly write "static," "locked-off," or "no camera movement." The same applies to subject framing — state where the subject sits, or the model will recompose on its own.


By Nafiul Hasan — Founder of Prompt Architects, where he builds prompt-enhancement tooling for ChatGPT, Claude, Gemini, Veo 3, and Kling and has tested 200+ camera prompts across leading AI video models. Last updated: June 10, 2026.

Frequently asked questions

Free Chrome Extension

Stop rewriting prompts. Start shipping.

Works with ChatGPT, Claude, Gemini, Grok, Midjourney, Ideogram, Veo3 & Kling. 5.0★ on the Chrome Web Store.

Create An Account