image-to-video

Originally frominference-sh/skills

Installation

SKILL.md

Image-to-Video AI Generation — Skill Reference

Version: 1.0.0 | Updated: 2026-03-02 | Category: Content & Video

Tool Comparison Matrix
Detailed Tool Profiles
Universal Prompt Best Practices
Camera Movement Reference
Subject Animation Guide
Consistency & Stability
Avoiding Distortion & Common Mistakes
Text & Logo Preservation
Thumbnail-to-Video Specific Guide
Prompt Templates

1. Tool Comparison Matrix

Tool	Best Model (Mar 2026)	Max Length	I2V	Free Tier	Max Resolution	Native Audio	Best For
Runway	Gen-4.5	10s	Yes	125 one-time credits (~25s Gen-4 Turbo)	4K (upscale)	No	Cinematic consistency, character ref
Kling	Kling 3.0 / 2.6 Pro	15s (3.0) / 10s (2.6)	Yes	66 daily credits (360-540p, watermark)	1080p (Master)	Yes (2.6+)	Motion control, product detail, fashion
Pika	Pika 2.5	10s	Yes	80 monthly credits (480p, watermark)	1080p+ (paid)	No	Creative effects (Pikaswaps, Pikadditions)
Luma	Ray3 / Ray3 Modify	20s (720p+)	Yes	30 gens/month (draft res, watermark)	1080p	No	Long clips, start+end frame, cinematic
Sora	Sora 2 / Sora 2 Pro	25s	Yes	None (Plus $20/mo minimum)	1080p (Pro: 1792x1024)	Yes	Narrative scenes, physics, dialogue
Vidu	Vidu Q3	16s	Yes	3 videos/month (720p, watermark)	4K (Q3 Pro)	Yes (native)	Multi-shot sequences, synced audio
Hailuo/MiniMax	Hailuo 2.3	10s	Yes	Daily bonus credits (720p, watermark)	1080p (paid)	Yes (2.6+)	Speed, social content, A/B testing
Google Veo	Veo 3.1	8s	Yes (Ingredients)	Limited (Gemini free: older model)	4K (3840x2160)	Yes	4K output, film language, camera control
Adobe Firefly	Firefly Video	5s	Yes	Limited credits (with CC sub)	2K native (up to 8K upscale)	No	Commercial-safe (IP indemnity), integration with CC
Seedance	Seedance 2.0 (ByteDance)	15s	Yes	Free credits on signup	1080p	Yes (native)	Multimodal input, fast generation
WAN	WAN 2.6 / 2.1	10s	Yes	Open source (run locally)	1080p	No	Open source, self-hosted, general-purpose

2. Detailed Tool Profiles

Runway (Gen-4 / Gen-4.5)

Current Models:

Gen-4.5 (latest, Jan 2026): State-of-the-art motion quality, prompt adherence, visual fidelity. Variable durations 2-10s.
Gen-4 Turbo: Fast, economical (5 credits/sec vs Gen-4.5 at 25 credits/sec). Good for iteration.
Gen-4: Mid-tier (12 credits/sec). Balanced quality/cost.

Image-to-Video Specifics:

Upload a reference image + text prompt describing motion
Choose duration (5 or 10 seconds) and aspect ratio
Enable "Fixed Seed" for reproducible motion
Reference images maintain character appearance, clothing, features across scenes
Strong spatial understanding — objects/backgrounds stay coherent during camera movement

Pricing:

Free: 125 one-time credits (~25s of Gen-4 Turbo, ~5s of Gen-4.5)
Standard: $12-15/mo (625 credits)
Pro: $28-35/mo (2,250 credits)
Unlimited: $76-95/mo (2,250 fast + unlimited relaxed)

Prompt Best Practices (Runway-Specific):

Focus prompts EXCLUSIVELY on motion — do NOT re-describe what is in the image
Start simple, iterate by adding detail
Use camera terms: pan, tilt, dolly, orbit, zoom, truck, pedestal, crane, rack focus, crash zoom
Structure: "The camera [motion] as [subject action]"
Abstract/conceptual language causes unpredictable results — be specific and physical
Re-describing image elements in detail can reduce motion or cause artifacts

Sources: Runway Pricing | Gen-4 Research | Gen-4.5 Research

Kling AI (Kling 3.0 / 2.6 Pro)

Current Models:

Kling 3.0 (latest): Scene-aware generation, character/prop consistency, native audio, 3-15s clips
Kling 2.6 Pro: Built-in English and Chinese audio, stronger prompt control, cinematic realism
Kling 2.6 Motion Control: Upload a motion reference video to guide character movement
Variants: Turbo (fast), Pro (balanced), Master (highest quality)

Image-to-Video Specifics:

Upload image as subject + describe movement in prompt
Motion Control mode: image + reference video for precise motion transfer
Preserves edges, logos, and fabric details (great for product/fashion)

Pricing:

Free: 66 daily credits (resets every 24h, no rollover). 360-540p, watermarked, non-commercial
Paid plans: $6.99-180/mo depending on tier

Prompt Best Practices (Kling-Specific):

For I2V: describe ONLY what should move/change + camera behavior. The image IS the scene.
Keep ONE main action ("hero action"). Hint at secondary motion only.
For Motion Control: do NOT describe motion in prompt (the reference video defines it). Use prompt for environment/look only.
Use terms like "slow push-in", "drone follow", "lateral track"
Describe pace with words like "glides smoothly" or "jerks to a halt"
Ensure character limbs are visible in source image (hidden limbs cause hallucination/extra fingers)
Leave "breathing room" around subject for movement
Match aspect ratios between image and motion reference

Sources: Kling AI | Kling 3.0 Guide | Kling 2.6 Motion Control

Pika Labs (Pika 2.5)

Current Models:

Pika 2.5 (latest): Sharper, smoother cinematic clips. Upgraded engine.
Pikaformance: Talking face model for lifelike voice-to-face performances
AI Selves: Personalized AI avatar creation

Key Features:

Pikaframes: Turn 2-5 images into smooth transition video with realistic movement
Pikaswaps: Replace objects in video (e.g., dog -> robot) with preserved lighting/motion
Pikadditions: Insert new characters/objects into footage
Scene Ingredients: Upload your own characters/objects for consistency

Pricing:

Free: 80 monthly credits. 480p only, watermarked, non-commercial
Paid: Unlocks all resolutions, removes watermark, commercial use

Prompt Best Practices (Pika-Specific):

Great for creative/stylized transformations rather than photorealistic
Use Pikaframes for multi-image storytelling
Specify lighting and physics behavior for realistic material interactions
Best for short creative social clips and effects-heavy content

Sources: Pika Pricing | Pika 2.5 Release

Luma Dream Machine (Ray3)

Current Models:

Ray3: Primary generation model. Supports 5-20s video depending on resolution.
Ray3 Modify: Modify existing footage with character reference images
Ray3.14: Draft resolution model (available on free tier)

Image-to-Video Specifics:

Upload still image, animate with natural motion and cinematic camera action
Start+End frame feature: provide first and last frame, AI generates the transition
Adds subtle camera pans, zooms, perspective shifts automatically

Pricing:

Free: $0/mo. 30 gens/month. Draft resolution (Ray3.14), 720p images, watermarked, personal only
Lite: $9.99/mo. 3,200 credits. 1080p images, watermarked, non-commercial
Plus: $29.99/mo. 10,000 credits. No watermark, commercial rights
Unlimited: $94.99/mo. 10,000 fast + unlimited relaxed

Video Duration by Resolution:

540p SDR: 5s (160 credits), 10s (320 credits)
720p SDR: 5-20s
1080p SDR: up to 20s

Sources: Luma Pricing | Dream Machine | Ray3 Info

OpenAI Sora (Sora 2)

Current Models:

Sora 2: Text-to-video and image-to-video with synchronized audio
Sora 2 Pro: Higher resolution (1792x1024) and better quality

Image-to-Video Specifics:

Start with a still image and expand it into motion
Physically accurate, realistic, controllable
Can insert people into any Sora-generated environment with accurate appearance and voice
Native dialogue and sound effects generation

Pricing:

NO free tier (as of Jan 10, 2026)
ChatGPT Plus ($20/mo): Unlimited 480p video generation
ChatGPT Pro ($200/mo): Higher quality, priority access
API: $0.10/sec (720p), $0.30/sec (720p Pro), $0.50/sec (1024p Pro)

Prompt Best Practices (Sora-Specific):

Rewards prompts describing INTENT and MOOD, not just motion
Use director-style framing and gradual motion introduction
Structure prompts in distinct sections: what happens, visual style, audio elements
Be explicit about sound (dialogue, foley, music, mood)
Specify character positioning, framing, emotional states, gestures
Describe physics: "gentle collision" vs "violent crash", "heavy object slides" vs "light feather floats"
Support 15-25 second clips. Describe pacing progression.
Specify 24fps for cinematic feel

Sources: Sora 2 Guide | Sora Announcement

Vidu (Vidu Q3)

Current Models:

Vidu Q3 (latest): Native audio+video in one pass, up to 16s, 2K resolution, multi-shot "Smart Cuts"
Vidu Q2: Previous gen. Natural motion, film-like camera effects.
Reference-to-Video 2.0: Character/subject consistency across generations

Key Features:

First AI model to generate multi-shot, edited-style sequences with synced audio from a single prompt
"Smart Cuts" for automatic multi-shot sequences
Audio: BGM + SFX synced to scene rhythm
Up to 4K in Q3 Pro via API

Pricing:

Free: 3 videos/month. 720p, watermarked
Paid plans available on vidu.com

Sources: Vidu | Vidu Q3 Guide | Vidu Q3 on WaveSpeed

Hailuo / MiniMax (Hailuo 2.3)

Current Models:

Hailuo 2.3 (latest): Improved physical actions, stylization, character micro-expressions, anime support
Hailuo 02: Standard and Fast variants. 768p and 1080p, up to 10s
Media Agent: Multi-modal creation with minimal manual editing

Pricing:

Free: $0/mo. Daily bonus credits. 720p, watermarked. Peak-hour wait times.
Standard: $9.99/mo. 1,000 credits, fast-track, no watermark, up to 5 tasks
Unlimited: $94.99/mo. Unlimited credits

Prompt Best Practices (Hailuo-Specific):

Works best with clean images and modest motion requests
Great for rapid A/B testing and short-form social content
Strong anime/stylized content support in 2.3

Sources: Hailuo AI | MiniMax Hailuo 2.3

Google Veo (Veo 3.1)

Current Models:

Veo 3.1: 4K output (3840x2160), vertical video (9:16), "Ingredients to Video" (up to 4 reference images)
Veo 3 Standard: Older model available to some free users
Veo 3 Fast: Lower-cost option

Key Features:

FIRST mainstream AI model with true 4K output
"Ingredients to Video": Accept up to 4 reference images per generation
Character identity consistency across scene changes
Native vertical video for YouTube Shorts / TikTok / Reels
Built-in audio generation

Pricing:

Free (Gemini): 100 monthly AI credits for Flow/Whisk. May get Veo 3 Standard (not 3.1)
Pro ($19.99/mo): Limited Veo 3.1 access
Ultra ($124.99/3mo or ~$42/mo): 25,000 monthly credits, full Veo 3.1
API: Veo 2 at $0.35-0.50/sec

Prompt Best Practices (Veo-Specific):

Excels with film language — reference shot types and pacing
Separate subject stability from camera motion in prompts
Input images should be 720p+ with 16:9 or 9:16 aspect ratio
Prompts referencing specific shot types produce more controlled results

Sources: Veo 3.1 4K Update | Veo 3.1 Blog | Google DeepMind Veo

Adobe Firefly Video

Current Model: Firefly Video (Feb 2026)

Key Features:

5s clips per generation
Native 2K resolution (up to 8K with Upscale)
IP indemnity — commercially safe, trained on licensed content
QuickCut: Upload b-roll or generate footage, auto-create structured first cut
Deep integration with Premiere Pro, After Effects, Creative Cloud

Pricing:

Firefly Standard: $9.99/mo (2,000 premium credits). ~20 videos at 100 credits/5s clip
Firefly Pro: $19.99/mo (4,000 premium credits)
Firefly Premium: $199.99/mo (50,000 premium credits)
Jan-Mar 2026 promo: Unlimited generations on paid plans

Best For: Enterprise/agency use where IP indemnity matters. Integration with existing Adobe workflows.

Sources: Adobe Firefly Pricing | Firefly Blog

Seedance 2.0 (ByteDance)

Current Model: Seedance 2.0

Key Features:

Unified multimodal audio-video joint generation (text, image, audio, video inputs)
4-15s video length
1080p resolution
30% faster than Seedance 1.0
Native audio generation (BGM + SFX)

Pricing:

Free credits on signup (check-in daily for more)

Sources: Seedance 2.0 | Seedance on fal.ai

WAN 2.6 / 2.1 (Open Source)

Current Models:

WAN 2.6: Latest release
WAN 2.1: Widely available, open-source on Hugging Face

Key Features:

Open source — run locally, no credits needed
1.3B and 14B parameter variants
Text AND image generation in video (Chinese + English)
Realistic physics simulation
Great general-purpose all-rounder

Pricing: Free (open source). Hardware costs only.

Best For: Self-hosted workflows, privacy-sensitive projects, unlimited generation without credits

Sources: WAN GitHub | WAN on HuggingFace

3. Universal Prompt Best Practices

The Golden Rules

Separate identity from motion. The image defines WHO/WHAT. The prompt defines HOW it MOVES.
Do NOT re-describe the image. This causes reduced motion or visual artifacts.
Start simple, iterate. Begin with one action, one camera move. Add complexity after testing.
Be physically specific, not conceptual. "Camera slowly pushes in" > "dramatic emphasis"
3-4 descriptive elements per component is the sweet spot. More adjectives past this degrades quality.

Prompt Structure Formula

[Camera movement], [pace/speed], [subject action], [environmental motion/details]

Example:

Slow push-in, steady cinematic pace, the developer's fingers type on the glowing keyboard,
holographic UI panels float and pulse with soft blue light around the workspace

The 8-Point Shot Grammar (Advanced)

For consistent cinematic outputs, cover these 8 elements:

Element	What to Specify	Example
1. Subject	Who/what is the focus	"A developer at a desk"
2. Emotion/Mood	Tone of the scene	"focused, intense concentration"
3. Optics/Framing	Shot type and lens	"medium close-up, 35mm lens"
4. Motion	Camera + subject movement	"slow dolly in, subtle typing motion"
5. Lighting	Light source and quality	"cool monitor glow, purple ambient neon"
6. Style	Visual aesthetic	"cinematic, dark moody, tech noir"
7. Audio (if supported)	Sound design	"mechanical keyboard clicks, ambient hum"
8. Continuity	What stays constant	"face remains still, same expression"

4. Camera Movement Reference

Movement Types with Prompt Keywords

Movement	Description	Prompt Keywords	Best For
Static/Locked	Camera stays still, subject moves	"static shot", "locked camera", "fixed frame"	Subtle expressions, product focus
Pan	Horizontal swivel from fixed point	"pan left", "pan right", "slow pan", "sweeping pan"	Revealing landscapes, following action
Tilt	Vertical angle up/down from fixed point	"tilt up", "tilt down", "slow tilt", "dramatic tilt"	Height emphasis, outfit reveal
Push-in/Dolly In	Camera moves toward subject	"push in", "dolly in", "slow push-in", "intimate push"	Building tension, product detail
Pull-back/Dolly Out	Camera moves away from subject	"pull back", "dolly out", "reveal pull-back"	Context reveal, scene endings
Truck	Camera moves parallel to subject	"truck left", "truck right", "lateral movement"	Walking scenes, shelf scanning
Pedestal	Camera moves vertically (elevator-like)	"pedestal up", "pedestal down", "rising reveal"	Revealing hidden elements
Tracking	Camera follows moving subject	"tracking shot", "follow shot", "match pace"	Action sequences, character walk
Orbit/Arc	Camera circles around subject	"orbit clockwise", "orbit counterclockwise", "arc around", "slow orbit"	Hero shots, product showcase, dramatic
Crane/Boom	Sweeping vertical + horizontal move	"crane up", "crane shot", "boom up", "sweeping crane"	Epic establishing shots, crowd reveals
Rack Focus	Focus shifts between planes	"rack focus", "shift focus", "focus pull"	Attention redirection
Crash Zoom	Very fast dramatic zoom	"crash zoom", "snap zoom", "whip zoom"	Action beats, comedy, emphasis
Zoom	Lens zooms in/out (not physical move)	"zoom in", "zoom out", "slow zoom"	Drawing attention, reveal
Handheld	Slight natural shake	"handheld", "shaky cam", "documentary style"	Realism, urgency, immediacy
FPV/First-Person	Camera IS the subject	"FPV", "first person view", "POV shot"	Immersive, gaming content

Combining Movements

You can combine camera movements for complex shots:

"Slow dolly in while panning slightly right, the camera rises gently"
"Crane up and orbit counterclockwise, revealing the full workspace"
"Tracking shot following the subject with a slight handheld shake"

Speed/Pace Modifiers

Modifier	Effect	Keywords
Very slow	Dreamy, contemplative	"very slow", "glacial pace", "barely moving"
Slow	Cinematic, elegant	"slow", "steady", "gentle", "smooth"
Medium	Natural, documentary	"natural pace", "moderate speed"
Fast	Energetic, dynamic	"fast", "dynamic", "brisk", "energetic"
Whip	Sudden, dramatic	"whip", "snap", "lightning fast", "sudden"

5. Subject Animation Guide

Subtle vs. Dramatic Motion Spectrum

Level	Description	Keywords	Use Case
Minimal	Almost imperceptible	"barely perceptible movement", "very subtle", "still with micro-motion"	Thumbnails, portrait-style
Subtle	Natural idle motion	"gentle sway", "subtle breathing", "slight movement", "soft idle"	Professional headshots, calm scenes
Moderate	Clear but controlled	"natural movement", "smooth gesture", "controlled action"	Product demos, presentations
Dynamic	Active, energetic	"active movement", "energetic", "fluid motion"	Action scenes, sports
Dramatic	Maximum motion	"explosive motion", "dramatic action", "intense movement"	Music videos, trailers

Animating Specific Elements

Hair/Clothing:

"hair gently moves as if from a light breeze"
"coat fabric ripples softly"
"scarf billows in the wind"

Eyes/Face (CAREFUL — most distortion-prone):

"eyes blink naturally"
"subtle smile forms"
"gaze shifts to the right"

WARNING: Keep facial animation minimal to avoid distortion. "Natural blinking" and "subtle expression" are safest.

Hands/Typing:

"fingers move across keyboard with natural rhythm"
"hands gesture subtly while speaking"
"subtle finger movement on the trackpad"

Environment/Background:

"particles float gently in the air"
"screen content scrolls slowly"
"ambient light pulses softly"
"clouds drift across the sky"

6. Consistency & Stability

Keeping the Subject Stable

Identity locks in prompt: "same face, same outfit, same hairstyle, consistent proportions"
Fixed Seed (Runway): Enable for reproducible motion across iterations
Reference images (Runway Gen-4+, Veo 3.1): Upload character reference for cross-scene consistency
Minimize facial motion: Faces drift the most. Keep face expressions subtle.
Foreground priority: Place main character in foreground, blur secondary faces
One subject focus: Multiple moving subjects = more drift. Focus on ONE.

Maintaining Visual Coherence

Use the SAME image for multi-clip generation (don't switch source images)
Save and reuse exact style parameters across batches (colors, aesthetic, motion quality)
Keep lighting description consistent: "cool blue monitor glow" in every prompt
Specify what should NOT change: "the background remains static" or "the desk stays perfectly still"

Cross-Scene Consistency

Runway Gen-4+: Upload character reference image for appearance matching
Veo 3.1 Ingredients: Up to 4 reference images per generation
Kling Motion Control: Character image + motion reference video
Pika Scene Ingredients: Upload characters/objects for consistency

7. Avoiding Distortion & Common Mistakes

Top 10 Distortion Causes and Fixes

Cause	Symptom	Fix
Re-describing image content in prompt	Reduced motion, visual artifacts	Prompt should ONLY describe motion, not the scene
Too many actions at once	Chaotic, incoherent motion	ONE hero action + hint secondary motion
Abstract/conceptual language	Unpredictable results	Use specific physical descriptions
Hidden limbs in source image	Extra fingers, hallucinated hands	Ensure all limbs visible in source
Wide-angle lens in source	Perspective distortion during motion	Use neutral focal length (35-85mm framing)
Too many adjectives	Quality degradation	3-4 descriptive elements per component max
Mismatched aspect ratios	Stretching, cropping artifacts	Match source image to output aspect ratio
Excessive facial animation	Face warping, identity drift	Keep face motion minimal ("subtle", "natural blink")
Low-resolution source image	Blurry, unstable output	Use 720p+ source images minimum
Contradictory instructions	Confused model output	Review prompt for conflicts

Negative Prompt Keywords (where supported)

Place critical exclusions first (models weight earlier terms more):

"blurry, low resolution, distorted, warped face, extra fingers, glitchy text,
unnatural movements, chaotic cuts, morphing features, flickering"

Quality Safeguards

Source image quality matters most. The cleaner the keyframe, the less the model invents.
Generate the source image with a good image model first (FLUX, Seedream 4.5, Midjourney)
Iterate ONE variable at a time when fixing issues (motion strength, camera move, style complexity)
Use preview/draft resolution first, then upscale the winner

8. Text & Logo Preservation

The Core Problem

AI video models struggle with text and logos. They frequently warp, blur, or morph text during motion. This is a fundamental limitation of current diffusion models.

Mitigation Strategies

Minimize motion near text areas:

"the text/logo remains perfectly stationary in the frame"
"camera movement avoids the text area"
"text stays sharp and readable throughout"

Use static camera for text-heavy areas: If text must be visible, keep the camera locked and animate only non-text elements.

Specify high contrast text:

"bold high-contrast text", "clear sans-serif text", "readable block letters"

Post-production approach (recommended for important text):
- Generate the video WITHOUT text
- Overlay text/logos in video editing (Premiere, After Effects, CapCut)
- This guarantees readability
Kling 2.6 for logos: Best at preserving edges and logos in product shots.
Short duration helps: Text stays more stable in 3-5s clips than 10s+.

9. Thumbnail-to-Video Specific Guide

For Tech/Coding Thumbnails (Dark bg, neon, workspace, developer)

This section is specifically designed for thumbnails with: dark backgrounds, purple/teal neon lighting, desk/workspace scenes, MacBook/monitors, developer character, floating UI elements, holographic panels.

Best Camera Movements for Desk/Workspace Scenes

Recommended (in order of effectiveness):

Slow Push-In (BEST for thumbnails):

"Slow push-in toward the developer's workspace, steady cinematic pace,
ambient particles float gently, monitor screens glow softly"

Why: Creates focus, minimal distortion, keeps face stable.

Subtle Orbit (dramatic, good for hero shots):

"Very slow orbit clockwise around the developer at the desk,
neon reflections shift on the monitor surface, floating UI elements rotate gently"

Why: Adds depth and drama without disrupting the subject.

Static + Ambient Motion (safest for face stability):

"Static locked camera, the developer sits motionless at the desk,
holographic panels pulse with soft light, particles drift upward,
screen content scrolls slowly"

Why: Zero face distortion. All motion is environmental.

Gentle Pedestal Up (reveal shot):

"Slow pedestal up from the keyboard level, rising to reveal the developer's face
and floating holographic displays, purple ambient light pulses"

Dolly Back + Reveal:

"Slow dolly backward revealing the full workspace setup,
multiple monitors glow, floating code snippets hover in the air"

Animating Floating UI Elements

"Translucent holographic panels float around the workspace, slowly rotating and pulsing"
"Code snippets hover in mid-air with a soft cyan glow, gently bobbing up and down"
"Floating UI windows orbit the developer, each displaying different data visualizations"
"Semi-transparent screens drift slowly, reflecting purple and blue neon light"
"Holographic interface elements materialize one by one around the desk"

Key descriptors for floating elements:

"translucent", "semi-transparent", "glassmorphic"
"floating", "hovering", "drifting", "orbiting"
"pulsing", "glowing", "flickering softly"
"materializing", "fading in", "dissolving"

Making Holographic Panels Glow/Pulse

"Holographic panels emit a soft pulsating blue-purple glow"
"Screens pulse rhythmically with cyan light, intensity rising and falling"
"Neon edges of the floating panels flicker with electric energy"
"Warm glow radiates from the holographic displays, casting colored light on the developer's face"
"Panels glow brighter momentarily before dimming back, creating a breathing light effect"

Light behavior keywords:

"pulsating", "breathing light", "rhythmic glow"
"flickering", "shimmering", "radiating"
"casting colored light", "reflecting off surfaces"
"intensity rising and falling", "soft oscillation"

Adding Subtle Typing/Screen Activity

"Fingers move naturally across the backlit keyboard, screen content scrolls upward"
"Subtle typing motion, code appearing on the main monitor line by line"
"The MacBook screen displays scrolling code with a soft green-on-black terminal"
"Cursor blinks on the screen, new lines of code appear gradually"
"Multiple monitors show different live data — one scrolling code, one showing metrics"

Keeping the Person's Face Stable with Ambient Motion

This is the #1 challenge. Here is the priority order:

Face = STATIC, Everything else = MOVING:

"The developer sits perfectly still, face unchanged, steady gaze at the screen.
Around him, holographic panels pulse with light, particles float upward,
keyboard keys glow softly, ambient neon light shifts between purple and blue"

Minimal face motion only:

"The developer blinks naturally, otherwise perfectly still.
Ambient environment has floating particles, pulsing lights, and drifting UI elements"

Identity locks: Always include: "same face throughout, consistent facial features, no face morphing"
Camera choice matters:
- Static camera = most stable face
- Slow push-in = face stays stable (camera moves, not face)
- Orbit = face can drift (use with caution)
- Any motion toward/around face = highest risk
Tool selection for face stability:
- Best: Kling (Motion Control with locked face), Luma (start+end frame)
- Good: Runway Gen-4.5 (character reference), Veo 3.1 (identity consistency)
- Risky: Sora (longer clips = more drift), Pika (creative focus, less stability)

10. Prompt Templates

Template 1: Thumbnail-to-Video (Tech Workspace)

Slow push-in, smooth cinematic pace. A developer sits at a dark workspace,
face perfectly still with natural subtle blinking. Holographic UI panels float
around the desk, pulsing with soft [purple/blue/cyan] neon light. Particles
drift gently upward. The MacBook screen displays scrolling code. Ambient
lighting shifts subtly between [purple and teal]. The background remains
perfectly static while floating elements orbit slowly.

Template 2: Product Showcase (Orbit)

Slow orbit clockwise around [product], smooth cinematic pace. The [product]
sits on a dark reflective surface. Ambient particles float in the air.
Soft studio lighting highlights edges and details. The background is
[dark/gradient]. Camera completes a quarter rotation over [5-10] seconds.

Template 3: Hero Shot (Person)

Static locked camera, [duration] seconds. [Person description] stands/sits
in [environment]. Face remains perfectly still. Hair moves gently from a
subtle breeze. Atmospheric particles float in the background. Volumetric
light rays stream through [window/source]. The mood is [cinematic/dramatic/calm].

Template 4: Code/Tech Demo

[Camera movement], steady pace. Close-up of a MacBook Pro screen showing
[code editor/terminal/dashboard]. Code scrolls upward naturally. The cursor
blinks. Ambient keyboard glow pulses softly. Shallow depth of field blurs
the background workspace. [Monitor reflections shift/bokeh lights drift].

Template 5: Dramatic Reveal

Slow crane up from [starting point], sweeping reveal. Rising from [desk level/
ground level] to reveal [full scene/workspace/city]. Atmospheric fog drifts
through the scene. [Neon/ambient] lights illuminate the environment.
The scale of the scene becomes apparent as the camera rises.

Schema

Inputs

Parameter	Type	Required	Description
source_image	Image (720p+ recommended)	Yes	The still image to animate
prompt	String	Yes	Motion description (camera + subject + environment)
tool	Enum	Yes	Which AI tool to use
duration	Integer (3-25s)	No	Target video length
aspect_ratio	Enum (16:9, 9:16, 1:1, 4:3)	No	Output aspect ratio
resolution	Enum (480p-4K)	No	Output resolution (tool-dependent)
motion_reference	Video	No	Motion reference video (Kling Motion Control only)

Outputs

Parameter	Type	Description
video	MP4	Generated video file
duration	Integer	Actual video length in seconds
resolution	String	Output resolution
credits_used	Integer	Credits consumed

Composable With

Skill	How
`thumbnail-generator`	Generate thumbnail image -> animate with this skill
`nano-banana-images`	Generate source image -> animate
`video-edit`	Post-process: trim, add text overlay, music
`pan-3d-transition`	Combine I2V clips with 3D transitions
`title-variants`	Generate titles -> overlay on video
`recreate-thumbnails`	Face-swap source image -> animate

Tool Selection Decision Tree

Need 4K output?
  -> Google Veo 3.1 or Vidu Q3 Pro

Need face stability (thumbnail/portrait)?
  -> Kling Motion Control (best) or Static camera on any tool

Need native audio?
  -> Sora 2, Vidu Q3, Seedance 2.0, Kling 2.6+, or Veo 3.1

Need free / no budget?
  -> Kling free (66 daily credits) or Hailuo free (daily bonus)
  -> WAN 2.1 (open source, run locally)

Need commercial IP safety?
  -> Adobe Firefly (IP indemnity)

Need creative effects (swaps, additions)?
  -> Pika 2.5 (Pikaswaps, Pikadditions, Pikaframes)

Need longest output?
  -> Sora 2 (25s) or Luma Ray3 (20s) or Vidu Q3 (16s)

Need product/fashion detail?
  -> Kling 2.6 Pro (preserves edges, logos, fabric)

Need fast iteration?
  -> Hailuo 2.3 (speed) or Runway Gen-4 Turbo (cheap credits)

Need multi-shot sequences?
  -> Vidu Q3 (Smart Cuts) or Veo 3.1 (Ingredients)

Need open source / self-hosted?
  -> WAN 2.1/2.6 (GitHub, HuggingFace)

Quick Reference Card

Top 5 Motion Keywords That Work Across All Tools

"slow push-in" / "dolly in"
"gentle orbit" / "arc around"
"static shot with ambient motion"
"tracking shot following"
"slow pan left/right"

Top 5 Stability Keywords

"face remains perfectly still"
"same face throughout, consistent features"
"background stays static"
"subtle natural motion only"
"locked camera, environmental motion only"

Top 5 Atmosphere Keywords (for tech thumbnails)

"holographic panels pulse with [color] light"
"particles float gently upward"
"neon ambient glow shifts between [colors]"
"translucent UI elements drift slowly"
"volumetric light rays, atmospheric haze"

Last updated: 2026-03-02 | Research covers tools available as of early March 2026

Related skills

More from aiagentwithdhruv/skills

Installs

Repository

aiagentwithdhruv/skills

GitHub Stars

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

image-to-video

Image-to-Video AI Generation — Skill Reference

Table of Contents

1. Tool Comparison Matrix

2. Detailed Tool Profiles

Runway (Gen-4 / Gen-4.5)

Kling AI (Kling 3.0 / 2.6 Pro)

Pika Labs (Pika 2.5)

Luma Dream Machine (Ray3)

OpenAI Sora (Sora 2)

Vidu (Vidu Q3)

Hailuo / MiniMax (Hailuo 2.3)

Google Veo (Veo 3.1)

Adobe Firefly Video

Seedance 2.0 (ByteDance)

WAN 2.6 / 2.1 (Open Source)

3. Universal Prompt Best Practices

The Golden Rules

Prompt Structure Formula

The 8-Point Shot Grammar (Advanced)

4. Camera Movement Reference

Movement Types with Prompt Keywords

Combining Movements

Speed/Pace Modifiers

5. Subject Animation Guide

Subtle vs. Dramatic Motion Spectrum

Animating Specific Elements

6. Consistency & Stability

Keeping the Subject Stable

Maintaining Visual Coherence

Cross-Scene Consistency

7. Avoiding Distortion & Common Mistakes

Top 10 Distortion Causes and Fixes

Negative Prompt Keywords (where supported)

Quality Safeguards

8. Text & Logo Preservation

The Core Problem

Mitigation Strategies

9. Thumbnail-to-Video Specific Guide

For Tech/Coding Thumbnails (Dark bg, neon, workspace, developer)

Best Camera Movements for Desk/Workspace Scenes

Animating Floating UI Elements

Making Holographic Panels Glow/Pulse

Adding Subtle Typing/Screen Activity

Keeping the Person's Face Stable with Ambient Motion

10. Prompt Templates

Template 1: Thumbnail-to-Video (Tech Workspace)

Template 2: Product Showcase (Orbit)

Template 3: Hero Shot (Person)

Template 4: Code/Tech Demo

Template 5: Dramatic Reveal

Schema

Inputs

Outputs

Composable With

Tool Selection Decision Tree

Quick Reference Card

Top 5 Motion Keywords That Work Across All Tools

Top 5 Stability Keywords

Top 5 Atmosphere Keywords (for tech thumbnails)

More from aiagentwithdhruv/skills

mac-control

gmaps-leads

excalidraw-visuals

video-edit

design-website

scrape-leads