acestep-simplemv
MV Render
Render music videos with waveform visualization and synced lyrics from audio + lyrics input.
Prerequisites
- Remotion project at
scripts/directory within this skill - Node.js + npm dependencies installed
- ffprobe available (for audio duration detection)
First-Time Setup
Before first use, check and install dependencies:
# 1. Check Node.js
node --version
# 2. Install npm dependencies
cd {project_root}/{.claude or .codex}/skills/acestep-simplemv/scripts && npm install
# 3. Check ffprobe
ffprobe -version
If ffprobe is not available, install ffmpeg (which includes ffprobe):
- Windows:
choco install ffmpegor download from https://ffmpeg.org/download.html and add to PATH - macOS:
brew install ffmpeg - Linux:
sudo apt-get install ffmpeg(Debian/Ubuntu) orsudo dnf install ffmpeg(Fedora)
Quick Start
cd {project_root}/{.claude or .codex}/skills/acestep-simplemv/
./scripts/render-mv.sh --audio /path/to/song.mp3 --lyrics /path/to/song.lrc --title "Song Title"
Output: MP4 file at out/<audio_basename>.mp4 (or custom --output path).
Script Usage
./scripts/render-mv.sh --audio <file> --lyrics <lrc_file> --title "Title" [options]
Options:
--audio Audio file path (absolute paths supported)
--lyrics LRC format lyrics file (timestamped)
--lyrics-json JSON lyrics file [{start, end, text}] (alternative to --lyrics)
--title Video title (default: "Music Video")
--subtitle Subtitle text
--credit Bottom credit text
--offset Lyric timing offset in seconds (default: -0.5)
--output Output file path (default: out/<audio_basename>.mp4)
--codec h264|h265|vp8|vp9 (default: h264)
--background Background image file path (if omitted, uses animated gradient)
--browser Custom browser executable path (Chrome/Edge/Chromium)
--max-size Max output file size in MB (e.g. 24). Auto-compresses if exceeded.
Use for IM platforms (WhatsApp≤16MB, Discord≤25MB, Telegram≤50MB)
Environment variables:
BROWSER_EXECUTABLE Path to browser executable (overrides auto-detection)
Browser Detection
Remotion requires a Chromium-based browser for rendering. The script auto-detects browsers in this priority order:
BROWSER_EXECUTABLEenvironment variable--browserCLI argument- Remotion cache (
chrome-headless-shell, downloaded by Remotion) - System Chrome (auto-uses
--chrome-mode=chrome-for-testing) - System Edge (pre-installed on Windows 10/11, auto-uses
--chrome-mode=chrome-for-testing) - System Chromium (auto-uses
--chrome-mode=chrome-for-testing)
Important: New versions of Chrome/Edge removed the old headless mode. When using regular Chrome/Edge/Chromium, the script automatically sets --chrome-mode=chrome-for-testing (which uses --headless=new). When using chrome-headless-shell, it uses the default headless-shell mode (which uses --headless=old). This is handled transparently.
If no browser is found, Remotion will attempt to download chrome-headless-shell from Google servers. This will fail if Google servers are inaccessible from your network.
Workarounds for restricted networks
Since Edge is pre-installed on Windows 10/11, it should be auto-detected without any manual configuration. The script automatically detects Chrome/Edge and uses the correct headless mode. If auto-detection fails:
# Option 1: Set environment variable
export BROWSER_EXECUTABLE="/path/to/msedge.exe"
# Option 2: Pass as CLI argument
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --browser "/path/to/msedge.exe"
# Option 3: Enable proxy and let Remotion download chrome-headless-shell
Examples
# Basic render
./scripts/render-mv.sh --audio /tmp/abc123_1.mp3 --lyrics /tmp/abc123.lrc --title "夜桜"
# Custom output path
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "My Song" --output /tmp/my_mv.mp4
# With subtitle and credit
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --subtitle "Artist Name" --credit "Generated by ACE-Step"
# With background image
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --background /path/to/cover.jpg
# Compress for Discord upload (max 25MB)
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --max-size 24
# Compress for WhatsApp (max 16MB)
./scripts/render-mv.sh --audio song.mp3 --lyrics song.lrc --title "Song" --max-size 15
IM Platform Upload Limits
When sending MV to chat platforms, use --max-size to auto-compress:
| Platform | Limit | Recommended --max-size |
|---|---|---|
| 16MB | 15 | |
| Discord (free) | 25MB | 24 |
| Telegram | 50MB | 48 |
| Slack (free) | 1GB | - |
The compression uses ffmpeg two-pass encoding to achieve the best quality within the size constraint.
Container / Docker Font Support
When running in containers (e.g. OpenClaw), CJK fonts may not be pre-installed, causing lyrics to render as □ boxes. The script automatically:
- Detects if CJK fonts are available (via
fc-list) - Attempts to install
fonts-noto-cjk(Debian/Ubuntu),font-noto-cjk(Alpine), orgoogle-noto-sans-cjk-fonts(Fedora/RHEL) - Falls back with a warning and manual install instructions if auto-install fails
If auto-install doesn't work, manually install fonts before rendering:
# Debian/Ubuntu
apt-get install -y fonts-noto-cjk
# Alpine
apk add font-noto-cjk
# Fedora/RHEL
dnf install -y google-noto-sans-cjk-fonts
File Naming
IMPORTANT: Use the audio file's job ID as the output filename to avoid overwriting. Do NOT use custom names like --output my_song.mp4. Let the default naming handle it (derives from audio filename).
Default output uses the audio filename as base:
- Audio:
acestep_output/{job_id}_1.mp3 - Lyrics:
acestep_output/{job_id}_1.lrc - Video: Pass
--output acestep_output/{job_id}.mp4(use the job ID from the audio file)
Example: if audio is chatcmpl-abc123_1.mp3, pass --output acestep_output/chatcmpl-abc123.mp4
Title Guidelines
- Keep
--titleshort and single-line (max ~50 chars, auto-truncated) - Use
--subtitlefor additional info - Do NOT put newlines in
--title
Good: --title "Open Source" --subtitle "ACE-Step v1.5"
Bad: --title "Open Source - ACE-Step v1.5\nCelebrating Music AI"
Notes
- Audio files with absolute paths are auto-copied to
public/by render.mjs - Duration is auto-detected via ffprobe
- Typical render time: ~1-2 minutes for a 90s song
- Output resolution: 1920x1080, 30fps