reel-builder

Installation
SKILL.md

Reel Builder

Create multi-scene animated reels from AI-generated images. Generates style-consistent stills, animates them with scene-to-scene transitions, and assembles a final video.

Pipeline

Storyboard → Anchor Image → Style-Consistent Scenes → CHECKPOINT
  → Kling Start/End Frame Animation → CHECKPOINT → Assemble Reel → Add Text Overlays

Prerequisites

  • FAL_KEY set in environment (~/.zshrc and OpenEd Vault/.env)
  • ffmpeg installed (brew install ffmpeg)
  • pip install requests

API Endpoints Used

Step Fal.ai Endpoint Cost Purpose
Anchor image fal-ai/flux/schnell ~$0.01 First scene, establishes style
Consistent scenes fal-ai/flux-pro/kontext/max $0.08/img Same character in new scenes
Fallback (big scene change) fal-ai/flux-general ~$0.075/MP Style reference when Kontext times out
Animation fal-ai/kling-video/v3/pro/image-to-video ~$0.56/5s Start+end frame interpolation

Critical Learnings (from March 2026 testing)

Aspect Ratio

  • DO NOT use nano-banana-pro for custom aspect ratios — it always outputs 1024x1024 square regardless of image_size param.
  • USE flux/schnell with explicit pixel dimensions: "image_size": {"width": 768, "height": 1344} for 9:16.
  • If the input image is square, Kling outputs square video. The input image MUST be the correct aspect ratio.

Dimension Reference

Aspect Width Height
9:16 (Reels) 768 1344
16:9 (YouTube) 1344 768
1:1 (Feed) 1024 1024

Style Consistency

  • Kontext Max (flux-pro/kontext/max) is the best tool for character consistency. Give it the anchor image + describe the new scene. It preserves character design, color palette, and art style.
  • Kontext may timeout on scenes very different from the anchor (e.g., indoor anchor → outdoor meadow). If it returns null, fall back to flux-general with reference_image_url parameter.
  • flux-general with reference_strength: 0.7 is the fallback — style-consistent but slightly less character-locked.

Scene-to-Scene Animation (the key technique)

  • Instead of animating each image independently, use Kling's start + end frame interpolation.
  • Pass start_image_url (Scene N) and end_image_url (Scene N+1) with a motion prompt describing the transition.
  • This creates smooth morphs between scenes — no hard cuts needed.
  • The motion prompt describes what CHANGES between the two frames, not the frames themselves.

Assembly

  • Kling may output slightly different dimensions per clip (e.g., 1088x1904 vs 1116x1852). Always normalize to 1080x1920 with ffmpeg before concatenation:
    ffmpeg -y -i clip.mp4 -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" -c:v libx264 -pix_fmt yuv420p clip_norm.mp4
    

Workflow (Step by Step)

Step 1: Storyboard

Define scenes as a table — each needs an image prompt and a transition prompt:

Scene Image Prompt Transition to Next
1 Mama fox sleeping on couch, watercolor... Kit tiptoes in with a drawing...
2 Kit showing mama a crayon drawing... Scene shifts outdoors, kit runs into meadow...
3 Kit playing in meadow with dandelion... Camera pushes in to kit's face...
4 Close-up kit with curious eyes... Kit turns and runs to mama, they embrace...
5 Mama and kit in warm embrace... (end)

Step 2: Generate Anchor Image

Use flux/schnell with explicit dimensions. The anchor establishes the style bible for all subsequent scenes.

curl -s -X POST "https://fal.run/fal-ai/flux/schnell" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "...", "image_size": {"width": 768, "height": 1344}}'

Review the anchor carefully — every subsequent image inherits its style.

Step 3: Generate Remaining Scenes (Kontext Max)

Feed the anchor image URL to Kontext Max for each scene:

curl -s -X POST "https://fal.run/fal-ai/flux-pro/kontext/max" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Same fox character and art style. [new scene description]. No text.",
    "image_url": "ANCHOR_URL",
    "image_size": {"width": 768, "height": 1344},
    "output_format": "png"
  }'

If Kontext times out (usually for very different compositions), fall back to flux-general:

curl -s -X POST "https://fal.run/fal-ai/flux-general" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "[scene description]. Same art style...",
    "reference_image_url": "ANCHOR_URL",
    "reference_strength": 0.7,
    "image_size": {"width": 768, "height": 1344}
  }'

CHECKPOINT: Review all images. Regenerate any that don't match.

Step 4: Animate with Kling (Start + End Frame)

Submit transition clips to the Kling queue. Each clip morphs from one scene to the next:

curl -s -X POST "https://queue.fal.run/fal-ai/kling-video/v3/pro/image-to-video" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "TRANSITION DESCRIPTION",
    "start_image_url": "SCENE_N_URL",
    "end_image_url": "SCENE_N+1_URL",
    "duration": "5",
    "aspect_ratio": "9:16",
    "generate_audio": false
  }'

Poll status at: GET https://queue.fal.run/fal-ai/kling-video/requests/{request_id}/status (Accept both 200 and 202 status codes while polling.)

Get result at: GET https://queue.fal.run/fal-ai/kling-video/requests/{request_id}

Timing: ~2-3 minutes per clip. Submit all clips simultaneously to parallelize.

CHECKPOINT: Review each clip.

Step 5: Assemble

Normalize dimensions, concatenate, output final reel:

# Normalize each clip to 1080x1920
for f in clip*.mp4; do
  ffmpeg -y -i "$f" -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" -c:v libx264 -pix_fmt yuv420p "${f%.mp4}_norm.mp4"
done

# Concatenate
printf "file '%s'\n" *_norm.mp4 > concat.txt
ffmpeg -y -f concat -safe 0 -i concat.txt -c:v libx264 -pix_fmt yuv420p final_reel.mp4

Step 6: Text Overlays (separate step)

Text is added AFTER animation — never bake text into the images or Kling prompts (it gets distorted). Options:

  • CapCut / InShot — manual, fastest for one-offs
  • ffmpeg drawtext — scripted, good for batch
  • Remotion — programmatic, see text-on-broll skill

Costs

Reel Type Images Clips Total
3-scene (2 transitions) ~$0.25 ~$1.12 ~$1.40
5-scene (4 transitions) ~$0.40 ~$2.24 ~$2.65
8-scene (7 transitions) ~$0.65 ~$3.92 ~$4.60

Prompt Tips

For image generation

  • Describe a single composed frame — think photography, not video
  • Be explicit about style: "watercolor and ink," "editorial photography," "nursery print aesthetic"
  • Always end with "No text anywhere" — models sometimes add random text
  • Include "Vertical composition" for 9:16

For Kontext (character consistency)

  • Start prompts with "Same fox character and art style" (or whatever your subject is)
  • Reference specific visual elements: "same watercolor and ink technique, same muted earth tones"
  • The more different the new scene is from the anchor, the more likely Kontext times out

For Kling transitions

  • Describe the CHANGE between frames, not the frames themselves
  • Keep to 1-2 actions per 5-second clip
  • Include ambient motion: "wildflowers sway," "light shifts," "breeze ruffles fur"
  • Don't include text descriptions — text in AI video always looks bad

Related Skills

  • video-generator — Single-shot video generation (VEO, Sora, Kling). Has the generate_video.py script with Kling provider support.
  • nano-banana-image-generator — Standalone image generation (Gemini). Good for square images only.
  • video-caption-creation — Add text overlays after assembly
  • text-on-broll — Remotion-based text overlay on video

Prompt Engineering Reference

See video-generator/references/ai-video-prompt-engineering-guide.md for comprehensive prompting guide (camera language, lighting, motion, audio, 14 sections from 7 sources).

Weekly Installs
1
GitHub Stars
4
First Seen
Mar 18, 2026