sfd-model-provider
Model Provider
Configure AI providers for the video-director pipeline. You need ElevenLabs for audio (required) and one video generation provider.
Video Generation Providers
Pick one. Ask the user which they prefer. If no preference, recommend KIE.
KIE (Recommended)
Most reliable for talking-head content. Fewer retries, cleaner audio.
- Go to https://docs.kie.ai/
- Create an account and navigate to the API section
- Generate an API key from the dashboard
- Set the environment variable:
export KIE_API_KEY=your_key_here
Models:
| Model | Best for | Notes |
|---|---|---|
| Kling 3.0 Pro | Talking-head, product demos | Most reliable for speech, fewer retries |
| Veo 3.1 | Cinematic, lifestyle b-roll | Better visual quality, less reliable for speech |
Default to Kling 3.0 Pro unless the format is b-roll or lifestyle content.
Wavespeed
Good for fast turnaround and async workflows with webhook callbacks.
- Go to https://wavespeed.ai/dashboard
- Create an account
- Navigate to API Keys in the dashboard
- Generate a new API key
- Set the environment variable:
export WAVESPEED_API_KEY=your_key_here
Models:
| Model | Best for | Notes |
|---|---|---|
| Veo 3.1 Fast | Quick iterations, testing | Faster but lower quality than KIE |
| Kling 2.1 | Budget-friendly talking-head | Older model, acceptable quality |
Fal
Wide model selection, serverless infrastructure, pay-per-second pricing.
- Go to https://fal.ai
- Sign up and navigate to Keys in the dashboard
- Create a new API key
- Set the environment variable:
export FAL_KEY=your_key_here
Models: Check available models at fal.ai/models. Filter by "video generation". Model availability changes frequently — the agent should check what's currently available.
Higgsfield
AI video and image generation platform with access to multiple models (Kling 3.0, Sora 2, Veo 3.1, WAN 2.6) through a single interface. Includes lipsync studio, motion control, and character consistency tools.
- Go to https://higgsfield.ai
- Sign up and create a workspace
- Navigate to API settings
- Generate an API key
- Set the environment variable:
export HIGGSFIELD_API_KEY=your_key_here
Models:
| Model | Best for | Notes |
|---|---|---|
| Kling 3.0 | Talking-head, up to 15s clips | Available through Higgsfield's unified API |
| Sora 2 | High-quality cinematic content | OpenAI's video model via Higgsfield |
| Veo 3.1 | Lifestyle, b-roll | Google's model via Higgsfield |
| WAN 2.6 | General purpose | Good all-rounder |
Higgsfield also offers lipsync studio and motion control for character actions up to 30 seconds — useful for longer-form talking-head content.
Replicate
Access to open-source and community models. Pay-per-second pricing.
- Go to https://replicate.com
- Sign up and navigate to Account Settings
- Copy your API token
- Set the environment variable:
export REPLICATE_API_TOKEN=your_key_here
Models: Check available models at replicate.com/collections/video-generation. Community models vary in quality — test with a single scene before committing to a full production run.
ElevenLabs — Audio
ElevenLabs handles speech verification and voice consistency across scenes.
- Speech-to-Text (STT) — verifies generated speech matches intended dialogue (QA gate)
- Speech-to-Speech (STS) — voice swap on the final video for consistent voice across scenes
Sign Up
- Go to https://elevenlabs.io
- Sign up and navigate to Profile Settings
- Copy your API key
- Set the environment variable:
export ELEVENLABS_API_KEY=your_key_here
Models
| Model | Used for | Notes |
|---|---|---|
| scribe_v1 | Speech-to-Text (STT) | Verifies dialogue accuracy per scene |
| eleven_english_sts_v2 | Speech-to-Speech (STS) | Final voice swap on concatenated video |
Verify
After setting keys, verify:
echo $ELEVENLABS_API_KEY # required
echo $KIE_API_KEY # or whichever video provider was chosen
If a variable is empty, add it to ~/.zshrc, ~/.bashrc, or a .env file in the project root.
Key Behaviors
- ElevenLabs is always required. It handles STT verification and voice swap — no alternative.
- Ask the user which video provider they prefer. Don't assume.
- KIE is the default recommendation — most reliable for speech-heavy content.
- Wavespeed is a good alternative for async workflows with webhook support.
- Fal and Replicate give access to more models but require the agent to handle model-specific parameters.
- The video-director pipeline works the same regardless of video provider. The agent sends a prompt, gets a clip, runs QA.
- If a provider consistently fails QA gates, suggest switching providers or models before burning more credits.
More from tfcbot/shortform-distribution-skills
sfd-api
VidJutsu — Video Intelligence API. Watch, extract, transcribe, check. The server-side gaps your agent can't fill.
3sfd-analytics
Pull performance data across connected accounts. Content audit, engagement report, and growth trends using Zernio post data and Instagram Insights.
3sfd-media-critic
Analyze video and image content using watch prompts for quality checks, verification, and deep analysis. QA gate before posting.
3sfd-schedule-posts
Schedule content to connected accounts with optimized timing and captions via Zernio.
3sfd-video-director
>-
3sfd-channel-strategy
Create a one-page channel spec — character, handle, format, niche, and 30-day content calendar for a managed account.
3