skills/ace-step/ace-step-1.5/acestep-simplemv

acestep-simplemv

SKILL.md

MV Render

Render music videos with waveform visualization and synced lyrics from audio + lyrics input.

Prerequisites

  • Remotion project at scripts/ directory within this skill
  • Node.js + npm dependencies installed
  • ffprobe available (for audio duration detection)

First-Time Setup

Before first use, check and install dependencies:

# 1. Check Node.js
node --version

# 2. Install npm dependencies
cd {project_root}/{.claude or .codex}/skills/acestep-simplemv/scripts && npm install

# 3. Check ffprobe
ffprobe -version

If ffprobe is not available, install ffmpeg (which includes ffprobe):

  • Windows: choco install ffmpeg or download from https://ffmpeg.org/download.html and add to PATH
  • macOS: brew install ffmpeg
  • Linux: sudo apt-get install ffmpeg (Debian/Ubuntu) or sudo dnf install ffmpeg (Fedora)

Quick Start

cd {project_root}/{.claude or .codex}/skills/acestep-simplemv/
./scripts/render-mv.sh --audio /path/to/song.mp3 --lyrics /path/to/song.lrc --title "Song Title"

Output: MP4 file at out/<audio_basename>.mp4 (or custom --output path).

Script Usage

./scripts/render-mv.sh --audio <file> --lyrics <lrc_file> --title "Title" [options]

Options:
  --audio        Audio file path (absolute paths supported)
  --lyrics       LRC format lyrics file (timestamped)
  --lyrics-json  JSON lyrics file [{start, end, text}] (alternative to --lyrics)
  --title        Video title (default: "Music Video")
  --subtitle     Subtitle text
  --credit       Bottom credit text
  --offset       Lyric timing offset in seconds (default: -0.5)
  --output       Output file path (default: out/<audio_basename>.mp4)
  --codec        h264|h265|vp8|vp9 (default: h264)
  --background   Background image file path (if omitted, uses animated gradient)
  --browser      Custom browser executable path (Chrome/Edge/Chromium)
  --max-size     Max output file size in MB (e.g. 24). Auto-compresses if exceeded.
                 Use for IM platforms (WhatsApp≤16MB, Discord≤25MB, Telegram≤50MB)

Environment variables:
  BROWSER_EXECUTABLE  Path to browser executable (overrides auto-detection)

Browser Detection

Remotion requires a Chromium-based browser for rendering. The script auto-detects browsers in this priority order:

  1. BROWSER_EXECUTABLE environment variable
  2. --browser CLI argument
  3. Remotion cache (chrome-headless-shell, downloaded by Remotion)
  4. System Chrome (auto-uses --chrome-mode=chrome-for-testing)
  5. System Edge (pre-installed on Windows 10/11, auto-uses --chrome-mode=chrome-for-testing)
  6. System Chromium (auto-uses --chrome-mode=chrome-for-testing)

Important: New versions of Chrome/Edge removed the old headless mode. When using regular Chrome/Edge/Chromium, the script automatically sets --chrome-mode=chrome-for-testing (which uses --headless=new). When using chrome-headless-shell, it uses the default headless-shell mode (which uses --headless=old). This is handled transparently.

If no browser is found, Remotion will attempt to download chrome-headless-shell from Google servers. This will fail if Google servers are inaccessible from your network.

Workarounds for restricted networks

Since Edge is pre-installed on Windows 10/11, it should be auto-detected without any manual configuration. The script automatically detects Chrome/Edge and uses the correct headless mode. If auto-detection fails:

# Option 1: Set environment variable
export BROWSER_EXECUTABLE="/path/to/msedge.exe"

# Option 2: Pass as CLI argument
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --browser "/path/to/msedge.exe"

# Option 3: Enable proxy and let Remotion download chrome-headless-shell

Examples

# Basic render
./scripts/render-mv.sh --audio /tmp/abc123_1.mp3 --lyrics /tmp/abc123.lrc --title "夜桜"

# Custom output path
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "My Song" --output /tmp/my_mv.mp4

# With subtitle and credit
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --subtitle "Artist Name" --credit "Generated by ACE-Step"

# With background image
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --background /path/to/cover.jpg

# Compress for Discord upload (max 25MB)
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --max-size 24

# Compress for WhatsApp (max 16MB)
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --max-size 15

IM Platform Upload Limits

When sending MV to chat platforms, use --max-size to auto-compress:

Platform Limit Recommended --max-size
WhatsApp 16MB 15
Discord (free) 25MB 24
Telegram 50MB 48
Slack (free) 1GB -

The compression uses ffmpeg two-pass encoding to achieve the best quality within the size constraint.

Container / Docker Font Support

When running in containers (e.g. OpenClaw), CJK fonts may not be pre-installed, causing lyrics to render as □ boxes. The script automatically:

  1. Detects if CJK fonts are available (via fc-list)
  2. Attempts to install fonts-noto-cjk (Debian/Ubuntu), font-noto-cjk (Alpine), or google-noto-sans-cjk-fonts (Fedora/RHEL)
  3. Falls back with a warning and manual install instructions if auto-install fails

If auto-install doesn't work, manually install fonts before rendering:

# Debian/Ubuntu
apt-get install -y fonts-noto-cjk

# Alpine
apk add font-noto-cjk

# Fedora/RHEL
dnf install -y google-noto-sans-cjk-fonts

File Naming

IMPORTANT: Use the audio file's job ID as the output filename to avoid overwriting. Do NOT use custom names like --output my_song.mp4. Let the default naming handle it (derives from audio filename).

Default output uses the audio filename as base:

  • Audio: acestep_output/{job_id}_1.mp3
  • Lyrics: acestep_output/{job_id}_1.lrc
  • Video: Pass --output acestep_output/{job_id}.mp4 (use the job ID from the audio file)

Example: if audio is chatcmpl-abc123_1.mp3, pass --output acestep_output/chatcmpl-abc123.mp4

Title Guidelines

  • Keep --title short and single-line (max ~50 chars, auto-truncated)
  • Use --subtitle for additional info
  • Do NOT put newlines in --title

Good: --title "Open Source" --subtitle "ACE-Step v1.5" Bad: --title "Open Source - ACE-Step v1.5\nCelebrating Music AI"

Notes

  • Audio files with absolute paths are auto-copied to public/ by render.mjs
  • Duration is auto-detected via ffprobe
  • Typical render time: ~1-2 minutes for a 90s song
  • Output resolution: 1920x1080, 30fps
Weekly Installs
34
GitHub Stars
7.9K
First Seen
Feb 24, 2026
Installed on
github-copilot34
codex34
kimi-cli34
amp34
gemini-cli34
cursor34