reel-builder
Reel Builder
Create multi-scene animated reels from AI-generated images. Generates style-consistent stills, animates them with scene-to-scene transitions, and assembles a final video.
Pipeline
Storyboard → Anchor Image → Style-Consistent Scenes → CHECKPOINT
→ Kling Start/End Frame Animation → CHECKPOINT → Assemble Reel → Add Text Overlays
Prerequisites
FAL_KEYset in environment (~/.zshrcandOpenEd Vault/.env)ffmpeginstalled (brew install ffmpeg)pip install requests
API Endpoints Used
| Step | Fal.ai Endpoint | Cost | Purpose |
|---|---|---|---|
| Anchor image | fal-ai/flux/schnell |
~$0.01 | First scene, establishes style |
| Consistent scenes | fal-ai/flux-pro/kontext/max |
$0.08/img | Same character in new scenes |
| Fallback (big scene change) | fal-ai/flux-general |
~$0.075/MP | Style reference when Kontext times out |
| Animation | fal-ai/kling-video/v3/pro/image-to-video |
~$0.56/5s | Start+end frame interpolation |
Critical Learnings (from March 2026 testing)
Aspect Ratio
- DO NOT use
nano-banana-profor custom aspect ratios — it always outputs 1024x1024 square regardless ofimage_sizeparam. - USE
flux/schnellwith explicit pixel dimensions:"image_size": {"width": 768, "height": 1344}for 9:16. - If the input image is square, Kling outputs square video. The input image MUST be the correct aspect ratio.
Dimension Reference
| Aspect | Width | Height |
|---|---|---|
| 9:16 (Reels) | 768 | 1344 |
| 16:9 (YouTube) | 1344 | 768 |
| 1:1 (Feed) | 1024 | 1024 |
Style Consistency
- Kontext Max (
flux-pro/kontext/max) is the best tool for character consistency. Give it the anchor image + describe the new scene. It preserves character design, color palette, and art style. - Kontext may timeout on scenes very different from the anchor (e.g., indoor anchor → outdoor meadow). If it returns null, fall back to flux-general with
reference_image_urlparameter. - flux-general with
reference_strength: 0.7is the fallback — style-consistent but slightly less character-locked.
Scene-to-Scene Animation (the key technique)
- Instead of animating each image independently, use Kling's start + end frame interpolation.
- Pass
start_image_url(Scene N) andend_image_url(Scene N+1) with a motion prompt describing the transition. - This creates smooth morphs between scenes — no hard cuts needed.
- The motion prompt describes what CHANGES between the two frames, not the frames themselves.
Assembly
- Kling may output slightly different dimensions per clip (e.g., 1088x1904 vs 1116x1852). Always normalize to 1080x1920 with ffmpeg before concatenation:
ffmpeg -y -i clip.mp4 -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" -c:v libx264 -pix_fmt yuv420p clip_norm.mp4
Workflow (Step by Step)
Step 1: Storyboard
Define scenes as a table — each needs an image prompt and a transition prompt:
| Scene | Image Prompt | Transition to Next |
|---|---|---|
| 1 | Mama fox sleeping on couch, watercolor... | Kit tiptoes in with a drawing... |
| 2 | Kit showing mama a crayon drawing... | Scene shifts outdoors, kit runs into meadow... |
| 3 | Kit playing in meadow with dandelion... | Camera pushes in to kit's face... |
| 4 | Close-up kit with curious eyes... | Kit turns and runs to mama, they embrace... |
| 5 | Mama and kit in warm embrace... | (end) |
Step 2: Generate Anchor Image
Use flux/schnell with explicit dimensions. The anchor establishes the style bible for all subsequent scenes.
curl -s -X POST "https://fal.run/fal-ai/flux/schnell" \
-H "Authorization: Key $FAL_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "...", "image_size": {"width": 768, "height": 1344}}'
Review the anchor carefully — every subsequent image inherits its style.
Step 3: Generate Remaining Scenes (Kontext Max)
Feed the anchor image URL to Kontext Max for each scene:
curl -s -X POST "https://fal.run/fal-ai/flux-pro/kontext/max" \
-H "Authorization: Key $FAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Same fox character and art style. [new scene description]. No text.",
"image_url": "ANCHOR_URL",
"image_size": {"width": 768, "height": 1344},
"output_format": "png"
}'
If Kontext times out (usually for very different compositions), fall back to flux-general:
curl -s -X POST "https://fal.run/fal-ai/flux-general" \
-H "Authorization: Key $FAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "[scene description]. Same art style...",
"reference_image_url": "ANCHOR_URL",
"reference_strength": 0.7,
"image_size": {"width": 768, "height": 1344}
}'
CHECKPOINT: Review all images. Regenerate any that don't match.
Step 4: Animate with Kling (Start + End Frame)
Submit transition clips to the Kling queue. Each clip morphs from one scene to the next:
curl -s -X POST "https://queue.fal.run/fal-ai/kling-video/v3/pro/image-to-video" \
-H "Authorization: Key $FAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "TRANSITION DESCRIPTION",
"start_image_url": "SCENE_N_URL",
"end_image_url": "SCENE_N+1_URL",
"duration": "5",
"aspect_ratio": "9:16",
"generate_audio": false
}'
Poll status at: GET https://queue.fal.run/fal-ai/kling-video/requests/{request_id}/status
(Accept both 200 and 202 status codes while polling.)
Get result at: GET https://queue.fal.run/fal-ai/kling-video/requests/{request_id}
Timing: ~2-3 minutes per clip. Submit all clips simultaneously to parallelize.
CHECKPOINT: Review each clip.
Step 5: Assemble
Normalize dimensions, concatenate, output final reel:
# Normalize each clip to 1080x1920
for f in clip*.mp4; do
ffmpeg -y -i "$f" -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" -c:v libx264 -pix_fmt yuv420p "${f%.mp4}_norm.mp4"
done
# Concatenate
printf "file '%s'\n" *_norm.mp4 > concat.txt
ffmpeg -y -f concat -safe 0 -i concat.txt -c:v libx264 -pix_fmt yuv420p final_reel.mp4
Step 6: Text Overlays (separate step)
Text is added AFTER animation — never bake text into the images or Kling prompts (it gets distorted). Options:
- CapCut / InShot — manual, fastest for one-offs
- ffmpeg drawtext — scripted, good for batch
- Remotion — programmatic, see
text-on-brollskill
Costs
| Reel Type | Images | Clips | Total |
|---|---|---|---|
| 3-scene (2 transitions) | ~$0.25 | ~$1.12 | ~$1.40 |
| 5-scene (4 transitions) | ~$0.40 | ~$2.24 | ~$2.65 |
| 8-scene (7 transitions) | ~$0.65 | ~$3.92 | ~$4.60 |
Prompt Tips
For image generation
- Describe a single composed frame — think photography, not video
- Be explicit about style: "watercolor and ink," "editorial photography," "nursery print aesthetic"
- Always end with "No text anywhere" — models sometimes add random text
- Include "Vertical composition" for 9:16
For Kontext (character consistency)
- Start prompts with "Same fox character and art style" (or whatever your subject is)
- Reference specific visual elements: "same watercolor and ink technique, same muted earth tones"
- The more different the new scene is from the anchor, the more likely Kontext times out
For Kling transitions
- Describe the CHANGE between frames, not the frames themselves
- Keep to 1-2 actions per 5-second clip
- Include ambient motion: "wildflowers sway," "light shifts," "breeze ruffles fur"
- Don't include text descriptions — text in AI video always looks bad
Related Skills
- video-generator — Single-shot video generation (VEO, Sora, Kling). Has the
generate_video.pyscript with Kling provider support. - nano-banana-image-generator — Standalone image generation (Gemini). Good for square images only.
- video-caption-creation — Add text overlays after assembly
- text-on-broll — Remotion-based text overlay on video
Prompt Engineering Reference
See video-generator/references/ai-video-prompt-engineering-guide.md for comprehensive prompting guide (camera language, lighting, motion, audio, 14 sections from 7 sources).