sfd-model-provider

Installation
SKILL.md

Model Provider

Configure AI providers for the video-director pipeline. You need ElevenLabs for audio (required) and one video generation provider.


Video Generation Providers

Pick one. Ask the user which they prefer. If no preference, recommend KIE.

KIE (Recommended)

Most reliable for talking-head content. Fewer retries, cleaner audio.

  1. Go to https://docs.kie.ai/
  2. Create an account and navigate to the API section
  3. Generate an API key from the dashboard
  4. Set the environment variable:
export KIE_API_KEY=your_key_here

Models:

Model Best for Notes
Kling 3.0 Pro Talking-head, product demos Most reliable for speech, fewer retries
Veo 3.1 Cinematic, lifestyle b-roll Better visual quality, less reliable for speech

Default to Kling 3.0 Pro unless the format is b-roll or lifestyle content.


Wavespeed

Good for fast turnaround and async workflows with webhook callbacks.

  1. Go to https://wavespeed.ai/dashboard
  2. Create an account
  3. Navigate to API Keys in the dashboard
  4. Generate a new API key
  5. Set the environment variable:
export WAVESPEED_API_KEY=your_key_here

Models:

Model Best for Notes
Veo 3.1 Fast Quick iterations, testing Faster but lower quality than KIE
Kling 2.1 Budget-friendly talking-head Older model, acceptable quality

Fal

Wide model selection, serverless infrastructure, pay-per-second pricing.

  1. Go to https://fal.ai
  2. Sign up and navigate to Keys in the dashboard
  3. Create a new API key
  4. Set the environment variable:
export FAL_KEY=your_key_here

Models: Check available models at fal.ai/models. Filter by "video generation". Model availability changes frequently — the agent should check what's currently available.


Higgsfield

AI video and image generation platform with access to multiple models (Kling 3.0, Sora 2, Veo 3.1, WAN 2.6) through a single interface. Includes lipsync studio, motion control, and character consistency tools.

  1. Go to https://higgsfield.ai
  2. Sign up and create a workspace
  3. Navigate to API settings
  4. Generate an API key
  5. Set the environment variable:
export HIGGSFIELD_API_KEY=your_key_here

Models:

Model Best for Notes
Kling 3.0 Talking-head, up to 15s clips Available through Higgsfield's unified API
Sora 2 High-quality cinematic content OpenAI's video model via Higgsfield
Veo 3.1 Lifestyle, b-roll Google's model via Higgsfield
WAN 2.6 General purpose Good all-rounder

Higgsfield also offers lipsync studio and motion control for character actions up to 30 seconds — useful for longer-form talking-head content.


Replicate

Access to open-source and community models. Pay-per-second pricing.

  1. Go to https://replicate.com
  2. Sign up and navigate to Account Settings
  3. Copy your API token
  4. Set the environment variable:
export REPLICATE_API_TOKEN=your_key_here

Models: Check available models at replicate.com/collections/video-generation. Community models vary in quality — test with a single scene before committing to a full production run.


ElevenLabs — Audio

ElevenLabs handles speech verification and voice consistency across scenes.

  • Speech-to-Text (STT) — verifies generated speech matches intended dialogue (QA gate)
  • Speech-to-Speech (STS) — voice swap on the final video for consistent voice across scenes

Sign Up

  1. Go to https://elevenlabs.io
  2. Sign up and navigate to Profile Settings
  3. Copy your API key
  4. Set the environment variable:
export ELEVENLABS_API_KEY=your_key_here

Models

Model Used for Notes
scribe_v1 Speech-to-Text (STT) Verifies dialogue accuracy per scene
eleven_english_sts_v2 Speech-to-Speech (STS) Final voice swap on concatenated video

Verify

After setting keys, verify:

echo $ELEVENLABS_API_KEY    # required
echo $KIE_API_KEY           # or whichever video provider was chosen

If a variable is empty, add it to ~/.zshrc, ~/.bashrc, or a .env file in the project root.


Key Behaviors

  • ElevenLabs is always required. It handles STT verification and voice swap — no alternative.
  • Ask the user which video provider they prefer. Don't assume.
  • KIE is the default recommendation — most reliable for speech-heavy content.
  • Wavespeed is a good alternative for async workflows with webhook support.
  • Fal and Replicate give access to more models but require the agent to handle model-specific parameters.
  • The video-director pipeline works the same regardless of video provider. The agent sends a prompt, gets a clip, runs QA.
  • If a provider consistently fails QA gates, suggest switching providers or models before burning more credits.
Related skills
Installs
4
First Seen
Apr 2, 2026