fal-generate
SKILL.md
fal-generate Skill
Overview
This skill enables AI content generation through fal.ai's latest models using a queue-based system. It supports:
- Text-to-Image - Generate images from text prompts
- Image-to-Image - Edit and transform existing images
- Text-to-Video - Create videos from text descriptions
- Image-to-Video - Animate images into videos
- Text-to-Speech - Generate natural speech from text
- Speech-to-Text - Transcribe audio to text
- Text-to-3D - Create 3D models from text
- Image-to-3D - Convert images to 3D models
- LLM / VLM / ALM - Run any LLM, vision, audio or video model via OpenRouter
Scripts
| Script | Purpose |
|---|---|
scripts/generate.sh |
Main generation tool with queue management |
scripts/upload.sh |
Upload files to fal CDN (returns URL) |
scripts/poll.sh |
Poll queue status until completion |
scripts/models.sh |
Search and discover models |
Prerequisites
export FAL_KEY="your-api-key"
Output Format
All generation scripts output JSON to stdout when using --wait. The JSON contains URLs to the generated content:
- Images:
{"images": [{"url": "https://fal.media/files/...", "width": 1024, "height": 1024}]} - Videos:
{"video": {"url": "https://fal.media/files/...mp4"}} - Audio/TTS:
{"audio": {"url": "https://fal.media/files/...mp3"}}or{"audio_url": "https://..."} - 3D Models:
{"model_mesh": {"url": "https://fal.media/files/...glb"}} - Transcription:
{"text": "transcribed content..."} - OpenRouter:
{"output": "LLM response text..."}
Without --wait, prints the request ID. With --async, prints only the request ID for later polling.
Examples by Category
Text-to-Image
# Basic image generation
./scripts/generate.sh -m fal-ai/kling-image/v3/text-to-image \
-p "A majestic mountain at sunrise, cinematic lighting" -w
# With aspect ratio and seed
./scripts/generate.sh -m fal-ai/flux-2/klein/9b \
-p "Professional headshot, studio lighting" \
--aspect-ratio "1:1" --seed 42 -w
# Ultra-fast generation
./scripts/generate.sh -m fal-ai/z-image/turbo \
-p "Quick concept sketch of a robot" -w
# With custom parameters (inference steps, guidance scale)
./scripts/generate.sh -m fal-ai/flux-2/klein/9b \
-p "Detailed portrait of a scientist" \
--param num_inference_steps=28 --param guidance_scale=3.5 -w
Image-to-Image (Edit/Transform)
# Upload local image first
IMAGE_URL=$(./scripts/upload.sh ~/photos/portrait.jpg)
# Edit with instructions
./scripts/generate.sh -m fal-ai/qwen-image-max/edit \
--image-url "$IMAGE_URL" \
-p "Make the background a sunset beach" -w
# Style transfer
./scripts/generate.sh -m fal-ai/glm-image/image-to-image \
--image-url "$IMAGE_URL" \
-p "Convert to oil painting style" -w
Text-to-Video
# Cinematic video with audio (Kling V3 Pro)
./scripts/generate.sh -m fal-ai/kling-video/v3/pro/text-to-video \
-p "A butterfly emerging from a cocoon in slow motion, macro lens" \
--duration 5 -w
# Fast video generation
./scripts/generate.sh -m fal-ai/ltx-2-19b/distilled/text-to-video \
-p "Drone shot flying over a city at golden hour" -w
# Google Veo 3.1 with sound
./scripts/generate.sh -m fal-ai/veo3.1 \
-p "A cat playing piano, realistic" -w
Image-to-Video (Animate Images)
IMAGE_URL=$(./scripts/upload.sh ~/photos/landscape.jpg)
# Animate a still photo
./scripts/generate.sh -m fal-ai/kling-video/o3/pro/image-to-video \
--image-url "$IMAGE_URL" \
-p "Gentle wind moving through the trees, clouds drifting" -w
# Lip-sync avatar from image + audio
AUDIO_URL=$(./scripts/upload.sh ~/audio/speech.mp3)
./scripts/generate.sh -m fal-ai/longcat-multi-avatar/image-audio-to-video \
--image-url "$IMAGE_URL" --audio-url "$AUDIO_URL" -w
Text-to-Speech
# High-quality TTS (MiniMax)
./scripts/generate.sh -m fal-ai/minimax/speech-2.8-hd \
-t "Hello! Welcome to the future of AI-generated content." -w
# Fast TTS
./scripts/generate.sh -m fal-ai/minimax/speech-2.8-turbo \
-t "This is a quick test of fast speech generation." -w
# Custom voice with Qwen-3 TTS
./scripts/generate.sh -m fal-ai/qwen-3-tts/text-to-speech/1.7b \
-t "Custom voice synthesis with natural intonation." -w
Voice Cloning
# Upload a voice sample (10+ seconds recommended)
VOICE_URL=$(./scripts/upload.sh ~/audio/voice-sample.wav)
# Clone and generate speech
./scripts/generate.sh -m fal-ai/qwen-3-tts/clone-voice/1.7b \
--audio-url "$VOICE_URL" \
-t "This sentence will be spoken in the cloned voice." -w
Speech-to-Text (Transcription)
AUDIO_URL=$(./scripts/upload.sh ~/recordings/meeting.mp3)
# Fast transcription
./scripts/generate.sh -m fal-ai/nemotron/asr \
--audio-url "$AUDIO_URL" -w
# Accurate transcription with timestamps
./scripts/generate.sh -m fal-ai/elevenlabs/speech-to-text/scribe-v2 \
--audio-url "$AUDIO_URL" -w
Text-to-3D
# Detailed 3D model from text
./scripts/generate.sh -m fal-ai/hunyuan-3d/v3.1/pro/text-to-3d \
-p "A detailed medieval sword with ornate handle" -w
# Fast 3D generation
./scripts/generate.sh -m fal-ai/hunyuan-3d/v3.1/rapid/text-to-3d \
-p "Simple wooden chair" -w
Image-to-3D
IMAGE_URL=$(./scripts/upload.sh ~/photos/object.jpg)
# Convert image to 3D model
./scripts/generate.sh -m fal-ai/hunyuan-3d/v3.1/rapid/image-to-3d \
--image-url "$IMAGE_URL" -w
# High-fidelity geometry
./scripts/generate.sh -m fal-ai/ultrashape \
--image-url "$IMAGE_URL" -w
OpenRouter — Run Any LLM
# Text chat with any LLM (GPT-5, Claude, Gemini, Llama 4, etc.)
./scripts/generate.sh -m openrouter/router \
-p "Explain quantum computing in simple terms" \
--param model=google/gemini-2.5-flash -w
# Vision — analyze an image
IMAGE_URL=$(./scripts/upload.sh ~/photos/chart.png)
./scripts/generate.sh -m openrouter/router/vision \
--image-url "$IMAGE_URL" \
-p "Describe what you see in this image" \
--param model=google/gemini-2.5-flash -w
# Audio — process audio with an ALM
AUDIO_URL=$(./scripts/upload.sh ~/audio/podcast.mp3)
./scripts/generate.sh -m openrouter/router/audio \
--audio-url "$AUDIO_URL" \
-p "Summarize the key points discussed" \
--param model=google/gemini-2.5-flash -w
# Video — analyze a video
VIDEO_URL=$(./scripts/upload.sh ~/videos/demo.mp4)
./scripts/generate.sh -m openrouter/router/video \
--video-url "$VIDEO_URL" \
-p "Describe what happens in this video" \
--param model=google/gemini-2.5-flash -w
Usage Patterns
Queue Mode (Default) — submit and poll
./scripts/generate.sh -m fal-ai/flux-2/klein/9b -p "Portrait" --wait
Async Mode — get ID, poll later
REQUEST_ID=$(./scripts/generate.sh -m fal-ai/kling-video/v3/pro/text-to-video \
-p "Drone flying over a city" --async)
./scripts/poll.sh fal-ai/kling-video/v3/pro/text-to-video $REQUEST_ID
File Upload — local files to fal CDN
IMAGE_URL=$(./scripts/upload.sh ~/photos/portrait.jpg)
./scripts/generate.sh -m fal-ai/kling-video/o3/pro/image-to-video \
--image-url "$IMAGE_URL" -p "Gentle wind blowing through hair" -w
Common Parameters
| Parameter | Description | Example |
|---|---|---|
-m, --model |
Model endpoint (required) | fal-ai/kling-image/v3/text-to-image |
-p, --prompt |
Text description | "A sunset over mountains" |
-t, --text |
Text for TTS models | "Hello world" |
--image-url |
Input image URL | "https://..." |
--video-url |
Input video URL | "https://..." |
--audio-url |
Input audio URL | "https://..." |
--aspect-ratio |
Output ratio | "16:9", "9:16", "1:1" |
--duration |
Video length (sec) | 5, 10 |
--seed |
Reproducibility | 12345 |
-w, --wait |
Poll until done | (flag) |
-a, --async |
Return ID only | (flag) |
--param |
Extra param (repeatable) | num_inference_steps=28 |
Environment Variables
| Variable | Required | Description |
|---|---|---|
FAL_KEY |
Yes | API authentication key |
FAL_WEBHOOK |
No | Webhook URL for callbacks |
Tips
- Always use
--waitor--async— Without either, you get the request ID + a manual curl command - Use
--paramfor advanced control — Pass any model-specific parameter:--param guidance_scale=7.5 - Check model schema —
./scripts/models.sh --schema <endpoint>to see all available params - Upload files first — Use
./scripts/upload.shfor local images/audio/video before generation - Use seeds — Same seed = same output for reproducible results
- Pro vs Standard — Pro = better quality + longer generation; Standard = cost-effective
- Flash/Turbo/Distilled — Best for previews and fast iterations
Model Catalog
Image Generation — February 2026
| Endpoint | Description |
|---|---|
fal-ai/kling-image/v3/text-to-image |
Kling V3: Latest Kling image model |
fal-ai/kling-image/v3/image-to-image |
Kling V3 image transformation |
fal-ai/kling-image/o3/text-to-image |
Kling Omni 3: Top-tier consistency |
fal-ai/kling-image/o3/image-to-image |
Kling Omni 3 image editing |
xai/grok-imagine-image |
xAI Grok Imagine: Highly aesthetic |
xai/grok-imagine-image/edit |
Grok Imagine editing |
fal-ai/hunyuan-image/v3/instruct/text-to-image |
Hunyuan 3.0 Instruct |
fal-ai/hunyuan-image/v3/instruct/edit |
Hunyuan 3.0 editing |
fal-ai/qwen-image-max/text-to-image |
Qwen Image Max: Enhanced realism |
fal-ai/qwen-image-max/edit |
Qwen Image Max editing |
fal-ai/z-image/base |
Z-Image Base: 6B fast model |
Image Generation — January 2026
| Endpoint | Description |
|---|---|
fal-ai/flux-2/klein/9b |
FLUX.2 Klein 9B: Photorealism & text |
fal-ai/flux-2/klein/9b/edit |
FLUX.2 Klein 9B editing |
fal-ai/flux-2/klein/9b/base/lora |
FLUX.2 Klein 9B with LoRA |
fal-ai/flux-2/klein/4b |
FLUX.2 Klein 4B: Lightweight |
fal-ai/glm-image |
GLM Image: Accurate text rendering |
bria/fibo-edit/edit |
Bria Fibo Edit: Multi-tool editing |
bria/fibo-edit/blend |
Bria Fibo composition |
bria/fibo-edit/relight |
Bria Fibo relighting |
bria/fibo-edit/restyle |
Bria Fibo artistic styles |
bria/fibo-lite/generate |
Bria Fibo Lite: Fast generation |
imagineart/imagineart-1.5-pro-preview/text-to-image |
ImagineArt 1.5 Pro: 4K |
Image Generation — December 2025
| Endpoint | Description |
|---|---|
fal-ai/flux-2-max |
FLUX.2 Max: State-of-the-art |
fal-ai/flux-2/turbo |
FLUX.2 Turbo: Fast generation |
fal-ai/flux-2/flash |
FLUX.2 Flash: Ultra-fast |
fal-ai/gpt-image-1.5 |
GPT Image 1.5: Strong prompt adherence |
fal-ai/bytedance/seedream/v4.5/text-to-image |
Seedream 4.5: ByteDance |
fal-ai/z-image/turbo |
Z-Image Turbo: 6B super fast |
fal-ai/qwen-image-2512 |
Qwen Image 2512 |
Video Generation — February 2026
| Endpoint | Description |
|---|---|
fal-ai/kling-video/v3/pro/text-to-video |
Kling 3.0 Pro: Cinematic + audio |
fal-ai/kling-video/v3/standard/text-to-video |
Kling 3.0 Standard |
fal-ai/kling-video/v3/pro/image-to-video |
Kling 3.0 Pro I2V |
fal-ai/kling-video/v3/standard/image-to-video |
Kling 3.0 Standard I2V |
fal-ai/kling-video/o3/pro/text-to-video |
Kling O3 Pro: Realistic |
fal-ai/kling-video/o3/pro/image-to-video |
Kling O3 Pro I2V |
fal-ai/kling-video/o3/pro/reference-to-video |
Kling O3 character consistency |
xai/grok-imagine-video/text-to-video |
Grok Video with audio |
xai/grok-imagine-video/image-to-video |
Grok Video I2V |
Video Generation — January 2026
| Endpoint | Description |
|---|---|
fal-ai/vidu/q3/text-to-video |
Vidu Q3 T2V |
fal-ai/vidu/q3/image-to-video |
Vidu Q3 I2V |
fal-ai/pixverse/v5.6/text-to-video |
Pixverse V5.6 T2V |
fal-ai/pixverse/v5.6/image-to-video |
Pixverse V5.6 I2V |
fal-ai/ltx-2-19b/text-to-video |
LTX-2 19B: Video + audio |
fal-ai/ltx-2-19b/image-to-video |
LTX-2 19B I2V |
fal-ai/ltx-2-19b/distilled/text-to-video |
LTX-2 Distilled: Fast |
fal-ai/longcat-multi-avatar/image-audio-to-video |
LongCat: Lip-sync avatar |
Video Generation — December 2025
| Endpoint | Description |
|---|---|
fal-ai/veo3.1 |
Veo 3.1: Google's best + sound |
fal-ai/veo3.1/fast |
Veo 3.1 Fast |
fal-ai/veo3.1/image-to-video |
Veo 3.1 I2V |
fal-ai/veo3.1/extend-video |
Veo 3.1 Extend: Up to 30s |
fal-ai/hunyuan-video-v1.5/text-to-video |
Hunyuan Video 1.5 T2V |
fal-ai/bytedance/seedance/v1.5/pro/text-to-video |
Seedance 1.5 Pro |
fal-ai/kandinsky5-pro/text-to-video |
Kandinsky 5 Pro |
fal-ai/live-avatar |
Live Avatar: Real-time |
clarityai/crystal-video-upscaler |
Crystal Video Upscaler |
fal-ai/creatify/aurora |
Creatify Aurora: Studio avatars |
Audio — February 2026
| Endpoint | Description |
|---|---|
fal-ai/minimax/speech-2.8-hd |
MiniMax 2.8 HD: Best TTS |
fal-ai/minimax/speech-2.8-turbo |
MiniMax 2.8 Turbo: Fast TTS |
Audio — January 2026
| Endpoint | Description |
|---|---|
fal-ai/qwen-3-tts/text-to-speech/1.7b |
Qwen-3 TTS 1.7B: Custom voices |
fal-ai/qwen-3-tts/text-to-speech/0.6b |
Qwen-3 TTS 0.6B: Lightweight |
fal-ai/qwen-3-tts/clone-voice/1.7b |
Qwen-3 Voice Clone: Zero-shot |
fal-ai/qwen-3-tts/clone-voice/0.6b |
Qwen-3 Voice Clone Light |
fal-ai/qwen-3-tts/voice-design/1.7b |
Qwen-3 Voice Design |
fal-ai/nemotron/asr |
Nemotron ASR: Fast STT |
fal-ai/nemotron/asr/stream |
Nemotron ASR Streaming |
fal-ai/elevenlabs/voice-changer |
ElevenLabs Voice Changer |
fal-ai/elevenlabs/speech-to-text/scribe-v2 |
ElevenLabs Scribe V2 |
fal-ai/deepfilternet3 |
DeepFilterNet3: Noise removal |
Audio — December 2025
| Endpoint | Description |
|---|---|
fal-ai/sam-audio/separate |
SAM Audio: Text-guided separation |
fal-ai/elevenlabs/music |
ElevenLabs Music |
fal-ai/maya/batch |
Maya: Expressive voice |
fal-ai/demucs |
Demucs: SOTA stemming |
fal-ai/index-tts-2/text-to-speech |
Index TTS 2.0 |
3D Generation — February 2026
| Endpoint | Description |
|---|---|
fal-ai/hunyuan-3d/v3.1/pro/text-to-3d |
Hunyuan 3D V3.1 Pro: Text to 3D |
fal-ai/hunyuan-3d/v3.1/pro/image-to-3d |
Hunyuan 3D V3.1 Pro: Image to 3D |
fal-ai/hunyuan-3d/v3.1/rapid/text-to-3d |
Hunyuan 3D Rapid: Fast |
fal-ai/hunyuan-3d/v3.1/rapid/image-to-3d |
Hunyuan 3D Rapid I2-3D |
fal-ai/ultrashape |
UltraShape: High-fidelity geometry |
3D Generation — December 2025
| Endpoint | Description |
|---|---|
fal-ai/trellis-2 |
Trellis 2: Versatile 3D |
fal-ai/hunyuan3d-v3/text-to-3d |
Hunyuan 3D V3 |
fal-ai/hunyuan-motion |
Hunyuan Motion: 3D animation |
fal-ai/meshy/v6-preview/text-to-3d |
Meshy V6 Preview |
OpenRouter Endpoints
Access 100+ LLMs via OpenRouter. Use --param model=<provider/model> to select the model.
Text (LLM)
| Endpoint | Description |
|---|---|
openrouter/router |
Any LLM: GPT-5, Claude, Gemini, Llama 4, Mistral |
openrouter/router/stream |
LLM with streaming |
openrouter/router/enterprise |
Enterprise LLM (enhanced SLA) |
openrouter/router/enterprise/stream |
Enterprise LLM streaming |
Vision (VLM)
| Endpoint | Description |
|---|---|
openrouter/router/vision |
Any VLM: Image analysis with GPT-5, Gemini, Claude |
openrouter/router/vision/stream |
Vision streaming |
openrouter/router/vision/enterprise |
Enterprise vision |
openrouter/router/vision/enterprise/stream |
Enterprise vision streaming |
Audio (ALM)
| Endpoint | Description |
|---|---|
openrouter/router/audio |
Any ALM: Audio analysis with Gemini |
openrouter/router/audio/stream |
Audio streaming |
openrouter/router/audio/enterprise |
Enterprise audio |
openrouter/router/audio/enterprise/stream |
Enterprise audio streaming |
Video (VLM)
| Endpoint | Description |
|---|---|
openrouter/router/video |
Any Video LM: Video analysis with Gemini |
openrouter/router/video/stream |
Video streaming |
openrouter/router/video/enterprise |
Enterprise video |
openrouter/router/video/enterprise/stream |
Enterprise video streaming |
OpenAI-Compatible
| Endpoint | Description |
|---|---|
openrouter/router/openai/v1/chat/completions |
OpenAI Chat Completions API |
openrouter/router/openai/v1/responses |
OpenAI Responses API |
openrouter/router/openai/v1/embeddings |
OpenAI Embeddings API |
Model Selection Guide
| Use Case | Recommended |
|---|---|
| Best image | fal-ai/kling-image/o3/text-to-image |
| Fastest image | fal-ai/z-image/turbo |
| Photorealistic | fal-ai/flux-2/klein/9b |
| Image editing | fal-ai/qwen-image-max/edit |
| Best video | fal-ai/kling-video/v3/pro/text-to-video |
| Fastest video | fal-ai/ltx-2-19b/distilled/text-to-video |
| Video + audio | xai/grok-imagine-video/text-to-video |
| Animate image | fal-ai/kling-video/o3/pro/image-to-video |
| Best TTS | fal-ai/minimax/speech-2.8-hd |
| Voice clone | fal-ai/qwen-3-tts/clone-voice/1.7b |
| Transcription | fal-ai/nemotron/asr |
| 3D from text | fal-ai/hunyuan-3d/v3.1/pro/text-to-3d |
| 3D from image | fal-ai/hunyuan-3d/v3.1/rapid/image-to-3d |
| Any LLM | openrouter/router |
| Vision/Audio | openrouter/router/vision or /audio |
Weekly Installs
5
Repository
lovisdotio/skills-fal-aiGitHub Stars
19
First Seen
Feb 6, 2026
Security Audits
Installed on
gemini-cli5
github-copilot5
codex5
kimi-cli5
amp5
opencode5