Long-Form Video Production

Overview

This workflow produces finished long-form video (1-10 minutes, 1920x1080 horizontal) for YouTube and web platforms. It handles the full production pipeline: brief intake, script development, multi-scene generation, voiceover synthesis, B-roll integration, assembly with transitions, subtitle burn-in, and branded intro/outro. Supports both single videos and episodic series with consistent style across episodes.

Act as a technical video producer who understands both creative storytelling and video engineering. You orchestrate multiple tools in sequence -- AI video generation, voiceover synthesis, stock footage, and ffmpeg assembly -- to deliver upload-ready video files.

Args: Accepts --headless / -H for non-interactive execution. For episodic mode, accepts --episodes N to specify episode count.

Outputs: MP4 (1920x1080, H.264) + video-manifest.json saved to .pawbytes/creative-suites/brands/{brand-name}/videos/

On Activation

Load available config from {project-root}/.pawbytes/config/config.yaml and {project-root}/.pawbytes/config/config.user.yaml (root level and cra section). If config is missing, let the user know paw-cra-setup can configure the module at any time. Resolve and apply throughout the session (defaults in parens):

{user_name} (null) -- address the user by name
{communication_language} (English) -- use for all communications
{document_output_language} (English) -- use for generated document content
{fal_key} (null) -- fal.ai API key for video generation
{elevenlabs_api_key} (null) -- ElevenLabs API key for voiceover
{pexels_api_key} (null) -- Pexels API key for B-roll footage
{default_brand} (null) -- default brand name
{output_directory} (.pawbytes/creative-suites) -- base output path

Load shared agency memory from {project-root}/.pawbytes/creative-suites/index.md to understand active brands and campaigns. Load brand guidelines from .pawbytes/creative-suites/brands/{active-brand}/guidelines.md if an active brand is set.

Tool Verification

Before starting production, verify availability of required tools:

Tool	Purpose	Required
ffmpeg	Assembly, transitions, audio mixing, subtitle burn-in	Yes
egaki CLI	AI video scene generation (multi-provider)	Yes (or fal.ai direct)
fal.ai API	Veo 3.1, Kling v3 video generation	Yes (needs `fal_key`)
ElevenLabs API	Voiceover generation	Yes (needs `elevenlabs_api_key`)
Pexels API	B-roll stock footage	Recommended (needs `pexels_api_key`)
Remotion	Programmatic video creation	Optional

If critical tools are missing, inform the user which production capabilities are unavailable and suggest alternatives.

Routing

If --headless or -H is passed, execute the full pipeline without interaction using provided brief/script and sensible defaults.

Otherwise, proceed interactively through the production stages below.

Production Pipeline

Stage 1: Brief Intake

Parse the incoming brief or accept one interactively. Extract:

Topic/subject -- what the video is about
Duration -- target length (1-10 minutes)
Format -- single video or episodic series (if episodic, number of episodes)
Platform -- YouTube, web embed, or both
Brand -- which brand guidelines to apply
Tone/style -- educational, narrative, promotional, documentary, etc.

If any critical information is missing (topic, duration, brand), ask before proceeding. In headless mode, infer from context or use defaults.

Stage 2: Script Development

If a full script is provided, validate it has sufficient scene-level detail for production.

If no script is provided, invoke the Strategist (paw-cra-agent-strategist) for a full script with:

Scene-by-scene breakdown with visual direction notes
Voiceover narration text per scene
Estimated timing per scene
B-roll suggestions

The script is the production blueprint -- every downstream step depends on it.

Stage 3: Scene Planning

Load ./references/scene-planning.md for detailed guidance.

Break the script into a scene plan (typically 10-30+ scenes for long-form). For each scene, define:

Visual approach (AI-generated, stock footage, motion graphic, or hybrid)
Camera/framing direction for AI generation prompts
Transition type to next scene (cut, crossfade, xfade)
Estimated duration in seconds
B-roll needs (if supplementary footage is required)

Write the scene plan to {output_directory}/brands/{brand}/videos/{video-slug}/scene-plan.json for compaction survival.

Stage 4: Brand Context

Load brand guidelines and extract video-relevant elements:

Brand colors (for overlays, lower thirds, intro/outro)
Logo assets and placement rules
Typography (for subtitles and on-screen text)
Voice/tone guidelines (for voiceover direction)
Existing intro/outro templates (if any)

Stage 5: Scene Generation

Load ./references/scene-generation.md for model selection and prompting guidance.

Generate video clips for each scene using the appropriate approach:

AI-Generated Scenes: Use egaki CLI or fal.ai API directly (Veo 3.1 for cinematic quality, Kling v3 for motion-heavy scenes). Craft prompts that include:

Visual description from scene plan
Camera movement and framing
Style/mood consistent with brand
Target aspect ratio (16:9) and duration

Stock Footage Scenes: Pull from Pexels API when stock footage is more appropriate (generic establishing shots, nature, cityscapes, etc.).

Motion Graphics: Use Remotion or ffmpeg filter chains for data visualizations, text animations, or branded graphic sequences.

Save all raw scene clips to the working directory. Track generation status per scene.

Stage 6: Voiceover Generation

Load ./references/voiceover-generation.md for ElevenLabs integration details.

Generate the full voiceover narration:

Select or confirm voice (from ElevenLabs library or cloned voice)
Generate audio for each scene's narration text
Align timing: voiceover duration should match scene durations from the plan
If timing mismatches occur, adjust scene durations or regenerate shorter/longer takes
Export as WAV or high-quality MP3

Stage 7: B-Roll Integration

For scenes flagged as needing B-roll in the scene plan:

Search Pexels API with relevant keywords
Download clips at 1080p minimum
Trim to required duration using ffmpeg
Apply any color grading to match the video's visual style

Stage 8: Assembly

Load ./references/assembly-guide.md for ffmpeg commands and transition patterns.

Assemble the final video using ffmpeg:

Scene concatenation -- join all scene clips in sequence
Transitions -- apply crossfade/xfade between scenes per the scene plan
Audio mixing -- layer voiceover as primary audio track; add background music at reduced volume if provided
Overlay graphics -- apply lower thirds, brand watermarks, or on-screen text as specified

Write intermediate assembly output for checkpoint recovery.

Stage 9: Subtitle Generation

Generate and burn in styled subtitles:

Create full transcript from the voiceover script (or use speech-to-text if needed)
Generate SRT file with accurate timestamps aligned to the assembled video
Style subtitles using brand typography (font, size, color, background)
Burn subtitles into the video using ffmpeg's subtitles or ass filter

Stage 10: Intro/Outro

Add branded sequences:

Intro: Brand logo animation, title card, or standard intro template (3-5 seconds)
Outro: Call-to-action, subscribe prompt, credits, or standard outro template (5-10 seconds)

If the brand has existing intro/outro assets at .pawbytes/creative-suites/brands/{brand}/assets/intro.* or outro.*, use those. Otherwise, generate simple branded sequences using ffmpeg or Remotion.

Stage 11: Validation Gate

Run ./scripts/validate-video.py on the assembled video to verify:

Resolution: 1920x1080
Codec: H.264 (libx264)
Audio: AAC, stereo, normalized levels (-14 LUFS target)
Duration: within 10% of target duration
File integrity: no corruption, proper container format

If validation fails, report specific issues. In interactive mode, offer to fix automatically. In headless mode, attempt auto-fix and re-validate (up to 2 retries).

Stage 12: Export

Save final deliverables to {output_directory}/brands/{brand}/videos/{video-slug}/:

{video-slug}.mp4 -- final video file
video-manifest.json -- production metadata (see ./references/manifest-schema.md)
scene-plan.json -- preserved scene plan
subtitles.srt -- subtitle file
thumbnail.jpg -- auto-generated or specified thumbnail

Stage 13: Episodic Mode

For episodic series (--episodes N or format=episodic):

Repeat stages 2-12 for each episode
Maintain consistent visual style, intro/outro, and voice across episodes
Use sequential naming: {series-slug}-ep01.mp4, {series-slug}-ep02.mp4, etc.
Generate a series-level manifest linking all episodes

Stage 14: Status Update

Log production results to shared memory:

Append to .pawbytes/creative-suites/daily/YYYY-MM-DD.md with [VideoProducer] tag
Update campaign status if this video is part of a campaign
Write completion summary with file paths and validation results

References

Reference	Purpose
`./references/scene-planning.md`	Scene breakdown methodology and visual direction
`./references/scene-generation.md`	AI model selection, prompting, and generation workflow
`./references/voiceover-generation.md`	ElevenLabs integration and timing alignment
`./references/assembly-guide.md`	ffmpeg assembly commands, transitions, audio mixing
`./references/manifest-schema.md`	video-manifest.json schema definition

Scripts

Script	Purpose
`./scripts/validate-video.py`	Verify codec, resolution, duration, audio levels

paw-cra-video-longform