Model Provider

Configure AI providers for the video-director pipeline. You need ElevenLabs for audio (required) and one video generation provider.

Video Generation Providers

Pick one. Ask the user which they prefer. If no preference, recommend KIE.

KIE (Recommended)

Most reliable for talking-head content. Fewer retries, cleaner audio.

Go to https://docs.kie.ai/
Create an account and navigate to the API section
Generate an API key from the dashboard
Set the environment variable:

export KIE_API_KEY=your_key_here

Models:

Model	Best for	Notes
Kling 3.0 Pro	Talking-head, product demos	Most reliable for speech, fewer retries
Veo 3.1	Cinematic, lifestyle b-roll	Better visual quality, less reliable for speech

Default to Kling 3.0 Pro unless the format is b-roll or lifestyle content.

Wavespeed

Good for fast turnaround and async workflows with webhook callbacks.

Go to https://wavespeed.ai/dashboard
Create an account
Navigate to API Keys in the dashboard
Generate a new API key
Set the environment variable:

export WAVESPEED_API_KEY=your_key_here

Models:

Model	Best for	Notes
Veo 3.1 Fast	Quick iterations, testing	Faster but lower quality than KIE
Kling 2.1	Budget-friendly talking-head	Older model, acceptable quality

Fal

Wide model selection, serverless infrastructure, pay-per-second pricing.

Go to https://fal.ai
Sign up and navigate to Keys in the dashboard
Create a new API key
Set the environment variable:

export FAL_KEY=your_key_here

Models: Check available models at fal.ai/models. Filter by "video generation". Model availability changes frequently — the agent should check what's currently available.

Higgsfield

AI video and image generation platform with access to multiple models (Kling 3.0, Sora 2, Veo 3.1, WAN 2.6) through a single interface. Includes lipsync studio, motion control, and character consistency tools.

Go to https://higgsfield.ai
Sign up and create a workspace
Navigate to API settings
Generate an API key
Set the environment variable:

export HIGGSFIELD_API_KEY=your_key_here

Models:

Model	Best for	Notes
Kling 3.0	Talking-head, up to 15s clips	Available through Higgsfield's unified API
Sora 2	High-quality cinematic content	OpenAI's video model via Higgsfield
Veo 3.1	Lifestyle, b-roll	Google's model via Higgsfield
WAN 2.6	General purpose	Good all-rounder

Higgsfield also offers lipsync studio and motion control for character actions up to 30 seconds — useful for longer-form talking-head content.

Replicate

Access to open-source and community models. Pay-per-second pricing.

Go to https://replicate.com
Sign up and navigate to Account Settings
Copy your API token
Set the environment variable:

export REPLICATE_API_TOKEN=your_key_here

Models: Check available models at replicate.com/collections/video-generation. Community models vary in quality — test with a single scene before committing to a full production run.

ElevenLabs — Audio

ElevenLabs handles speech verification and voice consistency across scenes.

Speech-to-Text (STT) — verifies generated speech matches intended dialogue (QA gate)
Speech-to-Speech (STS) — voice swap on the final video for consistent voice across scenes

Sign Up

Go to https://elevenlabs.io
Sign up and navigate to Profile Settings
Copy your API key
Set the environment variable:

export ELEVENLABS_API_KEY=your_key_here

Models

Model	Used for	Notes
scribe_v1	Speech-to-Text (STT)	Verifies dialogue accuracy per scene
eleven_english_sts_v2	Speech-to-Speech (STS)	Final voice swap on concatenated video

Verify

After setting keys, verify:

echo $ELEVENLABS_API_KEY    # required
echo $KIE_API_KEY           # or whichever video provider was chosen

If a variable is empty, add it to ~/.zshrc, ~/.bashrc, or a .env file in the project root.

Key Behaviors

ElevenLabs is always required. It handles STT verification and voice swap — no alternative.
Ask the user which video provider they prefer. Don't assume.
KIE is the default recommendation — most reliable for speech-heavy content.
Wavespeed is a good alternative for async workflows with webhook support.
Fal and Replicate give access to more models but require the agent to handle model-specific parameters.
The video-director pipeline works the same regardless of video provider. The agent sends a prompt, gets a clip, runs QA.
If a provider consistently fails QA gates, suggest switching providers or models before burning more credits.

sfd-model-provider

Model Provider

Video Generation Providers

KIE (Recommended)

Wavespeed

Fal

Higgsfield

Replicate

ElevenLabs — Audio

Sign Up

Models

Verify

Key Behaviors

More from tfcbot/shortform-distribution-skills

sfd-api

sfd-analytics

sfd-media-critic

sfd-schedule-posts

sfd-video-director

sfd-channel-strategy