minimax-multimodal-toolkit
MiniMax Multi-Modal Toolkit
Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/video format conversion, concatenation, trimming, and extraction.
Setup & Configuration
Prerequisites
brew install ffmpeg jq # macOS
sudo apt install ffmpeg jq # Linux (Debian/Ubuntu)
bash scripts/check_environment.sh # verify environment
No Python or pip required — all scripts are pure bash using curl, ffmpeg, jq, and xxd.
Note:
ffmpegis required for TTS voice bubble conversion (.mp3→.opus). Without it, TTS audio sends as a file attachment instead of a native voice bubble.
API Configuration
More from minimax-ai/skills
pptx-generator
Generate, edit, and read PowerPoint presentations. Create from scratch with PptxGenJS (cover, TOC, content, section divider, summary slides), edit existing PPTX via XML workflows, or extract text with markitdown. Triggers: PPT, PPTX, PowerPoint, presentation, slide, deck, slides.
2.8Kminimax-docx
>
2.6Kminimax-pdf
>
2.2Kminimax-xlsx
Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Use when the user asks to create, build, modify, analyze, read, validate, or format any Excel spreadsheet, financial model, pivot table, or tabular data file. Covers: creating new xlsx from scratch, reading and analyzing existing files, editing existing xlsx with zero format loss, formula recalculation and validation, and applying professional financial formatting standards. Triggers on 'spreadsheet', 'Excel', '.xlsx', '.csv', 'pivot table', 'financial model', 'formula', or any request to produce tabular data in Excel format.
2.1Kfullstack-dev
|
1.7Kfrontend-dev
|
1.6K