giggle-voice-clone
Voice Clone
Clones a voice from a reference audio URL via giggle.pro. Flow: submit voice-clone with file.url directly → script polls until completed. Returns full signed audio URLs. Tell the user before the long-running exec that generation is in progress and you will return as soon as the script finishes (see Continuous progress updates).
API Key: Set system environment variable GIGGLE_API_KEY. Obtain it at giggle.pro while logged in: left sidebar → API Key (API 密钥). The script will prompt if not configured.
No inline Python: All commands must be executed via the
exectool. Never use heredoc inline code.
No Retry on Error: If script execution encounters an error, do not retry. Report the error to the user directly and stop.
Execution Flow
Voice cloning typically takes 1–3 minutes. The script submits voice-clone with file.url directly (no upload step), then polls for result.
Important: Never pass
GIGGLE_API_KEYin exec'senvparameter. API Key is read from system environment variable.
Continuous progress updates (default; user need not put this in their prompt)
One exec runs the whole flow: submit + poll until done (--max-wait, default 180s). There is no separate --query loop for you to run.
- Before the script, tell the user cloning/synthesis started, typical wait ~1–3 minutes, and you will paste signed audio links or errors when the command finishes—never start a long run with zero message.
- Do not wait for the user to say “check status” before launching the script after that preamble.
- When the script exits, immediately forward stdout (URLs or errors) in natural language; on timeout at
--max-wait, explain and suggest retry or a different sample/voice_id. - If the user wants zero preamble, use one minimal line only, then run the script.
Step 1: Guide User to Provide Requirements
Before running, you must collect:
- Audio URL – A publicly accessible URL of the reference audio (e.g. MP3, WAV). User provides a link to the sample they want to clone.
- voice_id – User-defined. Must be unique per clone. Example:
my_voice_001,minimax_testasds_02. If duplicate, API returnsvoice clone voice id duplicate. - Text – The text to synthesize with the cloned voice (e.g. "A gentle breeze sweeps across the soft grass...").
Step 2: Run Full Flow
Before the command below, send the user the short preamble described above.
python3 scripts/voice_clone_api.py \
--audio-url "https://example.com/voice_sample.mp3" \
--text "A gentle breeze sweeps across the soft grass, carrying the fresh scent." \
--voice-id "my_unique_voice_01" \
--need-noise-reduction false \
--need-volumn-normalization false
Optional parameters:
--need-noise-reduction(default: false): Apply noise reduction to cloned audio--need-volumn-normalization(default: false): Apply volume normalization
Step 3: Handle Output
Success: Script outputs the full signed audio URL(s). Forward to user as-is.
Failure:
voice clone voice id duplicate: Guide user to choose a different voice_id- Other errors: Report error message to user
Link Return Rule
Audio links returned to the user must be full signed URLs (with Policy, Key-Pair-Id, Signature query params). Do not strip response-content-disposition=attachment when the API returns it. The script only normalizes ~ → %7E; keep URLs as-is when forwarding.
Parameter Reference
| Parameter | Required | Default | Description |
|---|---|---|---|
--audio-url |
yes | - | Public URL of reference audio to clone |
--text |
yes | - | Text to synthesize with cloned voice |
--voice-id |
yes | - | User-defined unique voice identifier; must not duplicate existing |
--need-noise-reduction |
no | false | Apply noise reduction |
--need-volumn-normalization |
no | false | Apply volume normalization |
--max-wait |
no | 180 | Max wait seconds for clone task |
Interaction Guide
When the user initiates voice clone:
- Ask: "Please provide a publicly accessible URL to the reference audio to clone."
- Ask: "Please choose a unique
voice_idfor this clone (e.g. my_voice_001); it must not duplicate an existing clone." - Ask: "Please provide the text to synthesize with the cloned voice."
- After the user provides all three, run the script and forward the output.
If voice_id duplicate: "That voice_id is already in use—please pick another unique id and try again."
More from giggle-official/skills
giggle-generation-drama
Use this feature when users want to generate videos, shoot short films, or view available video styles. Triggering keywords: short film, make video, shoot short film, short video, AI video, generate video from story, short drama, narrated video, cinematic video, available video styles.
66giggle-generation-image
Supports text-to-image and image-to-image. Use when the user needs to create or generate images. After submit, proactively poll task status every ~15–30s and message the user each time until completed/failed/timeout—do not wait for the user to ask for progress. Use cases: (1) Generate from text description, (2) Use reference images, (3) Customize model, aspect ratio, resolution. Triggers: generate image, draw, create image, AI art.
63giggle-generation-aimv
Use when the user wants to create AI music videos (MV)—including generating music from text prompts or using custom lyrics. Before blocking on execute_workflow, tell the user the MV pipeline is running until completion; after return, immediately forward the result—user need not ask for progress. Triggers: generate MV, music video, make video for this song, lyrics video, create MV, AI music video, music+video, generate video from lyrics.
59giggle-generation-video
Supports text-to-video and image-to-video conversion (start frame/end frame). Trigger words: text-to-video, image-to-video.
58giggle-generation-music
Use when the user wants to create, generate, or compose music—whether from text description, custom lyrics, or instrumental background music. After submit, proactively poll task status every ~15–30s and message the user each time until completed/failed/timeout—do not wait for the user to ask for progress. Triggers: generate music, write a song, compose, create music, AI music, background music, instrumental, beats.
56giggle-generation-speech
Use when the user wants to generate speech, voiceover, or text-to-audio. Converts text to AI voice via Giggle.pro TTS API. Keep the user informed until audio is ready: message before long waits, use Cron/sync poll so the user need not ask for progress. Triggers: generate speech, text-to-speech, TTS, voiceover, read this text aloud, synthesize speech.
55