byted-music-generate
Music Generate Skill
Generate music using the Volcengine Music Generation API. Supports vocal songs, instrumental BGM, and AI lyrics generation.
Trigger Conditions
- User wants to generate a song (with lyrics or a text prompt)
- User needs background music, instrumental tracks, or soundtracks
- User wants AI-generated lyrics
- User mentions "write a song", "music generation", "BGM", "background music", "lyrics"
Environment Variables
Two authentication methods are supported (gateway takes priority):
Option 1: API Gateway (recommended)
ARK_SKILL_API_BASE— API gateway base URLARK_SKILL_API_KEY— API gateway authentication key
Option 2: Direct AK/SK
VOLCENGINE_ACCESS_KEY— AccessKey IDVOLCENGINE_SECRET_KEY— AccessKey Secret- How to obtain: Volcengine Console → Account → Key Management → Create Key
Usage
- Determine user intent and select the mode (
song/bgm/lyrics). cdto the skill directory:skills/byted-music-generate.- Run the script. The script polls the API internally and may take several minutes to complete (typically 1–5 minutes for song/bgm).
- Monitor execution: If the runtime environment moves the command to background, you MUST periodically (every 10 seconds) read the terminal output to check whether the script has finished. The script prints polling progress to stderr and outputs a single JSON line to stdout upon completion.
- Once completed, return the
audio_urlorlyricsfrom the JSON output to the user.
Three Modes
1. song — Vocal Song
User provides lyrics (Lyrics) or a text prompt (Prompt) to generate a vocal song.
# With text prompt
python scripts/music_generate.py song --prompt "A song about summer at the beach" --genre Pop --gender Female
# With lyrics
python scripts/music_generate.py song --lyrics "[verse]\nMoonlight on the windowsill\nMemories flowing like water\n[chorus]\nYou are my moonlight" --genre Folk --mood "Sentimental/Melancholic/Lonely"
Note: --lyrics and --prompt are mutually exclusive; lyrics takes priority. If the user hasn't provided lyrics, you can first use the lyrics mode to generate them, then pass the result to the song mode.
2. bgm — Instrumental BGM
Describe the desired music in natural language. The v5.0 model does not require Genre/Mood parameters — just describe everything in the --text field.
python scripts/music_generate.py bgm --text "Relaxed coffee shop ambiance music with piano and guitar" --duration 60
# With song structure segments
python scripts/music_generate.py bgm --text "Epic game soundtrack" --segments '[{"Name":"intro","Duration":10},{"Name":"chorus","Duration":30}]'
3. lyrics — Lyrics Generation
Returns synchronously (no polling needed). Can be used standalone or as a pre-step for the song mode.
python scripts/music_generate.py lyrics --prompt "A song about graduation farewell" --genre Folk --mood "Sentimental/Melancholic/Lonely" --gender Female
Manual Task Query (timeout fallback)
python scripts/music_generate.py query --task-id "202601397834584670076931"
Mode Detection Logic
User Request
↓
Contains "instrumental/BGM/background music/soundtrack"?
├─ Yes → bgm mode
└─ No → Contains "lyrics/write lyrics" and does NOT request audio?
├─ Yes → lyrics mode
└─ No → song mode
├─ User provided lyrics → --lyrics
└─ User only described a theme → --prompt (or lyrics first, then song)
Script Parameters
song mode
| Parameter | Required | Description |
|---|---|---|
--lyrics |
either | Lyrics with structure tags |
--prompt |
either | Text prompt (Chinese, 5-700 chars) |
--model-version |
no | v4.0 or v4.3 (default: v4.3) |
--genre |
no | Music genre |
--mood |
no | Music mood |
--gender |
no | Female / Male |
--timbre |
no | Vocal timbre |
--duration |
no | Duration in seconds [30-240] |
--key |
no | Musical key (v4.3 only) |
--kmode |
no | Major / Minor (v4.3 only) |
--tempo |
no | Tempo (v4.3 only) |
--instrument |
no | Instruments, comma-separated (v4.3 only) |
--genre-extra |
no | Secondary genres, comma-separated, max 2 (v4.3 only) |
--scene |
no | Scene tags, comma-separated (v4.3 only) |
--lang |
no | Language (v4.3 only) |
--vod-format |
no | wav / mp3 (v4.3 only) |
--billing |
no | prepaid / postpaid (default: postpaid) |
--timeout |
no | Max wait seconds (default: 300) |
bgm mode
| Parameter | Required | Description |
|---|---|---|
--text |
yes | Natural language description |
--duration |
no | Duration in seconds [30-120] |
--segments |
no | JSON array of song structure segments |
--version |
no | Model version (default: v5.0) |
--enable-input-rewrite |
no | Enable prompt rewriting |
--billing |
no | prepaid / postpaid (default: postpaid) |
--timeout |
no | Max wait seconds (default: 300) |
lyrics mode
| Parameter | Required | Description |
|---|---|---|
--prompt |
yes | Lyrics prompt (Chinese only, <500 chars) |
--genre |
no | Music genre |
--mood |
no | Music mood |
--gender |
no | Female / Male |
Script Return Info
The script outputs JSON with the following fields:
{
"status": "success | timeout | error",
"mode": "song | bgm | lyrics | query",
"task_id": "...",
"audio_url": "https://...",
"duration": 46.0,
"lyrics": "...",
"error": null
}
Return the audio_url to the user for download or playback. URLs are valid for approximately 1 year, but users should save the file promptly.
Error Handling
- IF the script raises
PermissionError: Authentication not configured ..., inform the user to configure either API gateway (ARK_SKILL_API_BASE+ARK_SKILL_API_KEY) or direct AK/SK (VOLCENGINE_ACCESS_KEY+VOLCENGINE_SECRET_KEY) environment variables. Write them to the workspace environment variable file, then retry. - IF
statusis"timeout", the task is still generating. Provide the user with thetask_idand the manual query command from the output. - IF copyright check fails (code 50000001), suggest the user enrich the description or increase the audio duration, then retry.
References
- Available parameter values (Genre/Mood/Timbre/Instrument etc.): references/parameters.md
- Volcengine Music Generation Docs
- API Signature Guide
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
371byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
187byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
164byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
130byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
111byted-data-search
|
107