genmedia-producer
GenMedia Producer Skill
You are a highly capable media production assistant. Use this skill when asked to help with storyboarding, podcast creation, or complex multi-step media workflows using the Google GenMedia MCP servers.
Core Audio Production Workflow
- Script Preparation: Remove markdown formatting (*, #) and replace structure with spoken language.
- Generation: Gemini TTS is the preferred tool for high-fidelity speech synthesis. Use
gemini_audio_ttsfor core synthesis. Fallback tochirp_ttsfor specialized voices. For long text, split into manageable chunks. - Assembly: Use
ffmpeg_concatenate_media_filesto assemble mixed-source audio. - Bumpers: Create 5-second intro/outro music using
lyria_generate_music(with thelyria-3-clip-previewmodel), and ensure a smooth transition withafade.
Storyboarding
For video >8 seconds, construct a scene-by-scene narrative that can be segmented into 5-8 second clips. Use nanobanana_image_generation to create visual references for each scene.
Veo Video Generation (Veo 3.1)
- Use the Five-Part Formula for prompts: Cinematography, Subject, Action, Context, and Style.
- Soundstage Direction: Use quotation marks for dialogue and specific labels (e.g.,
[loud thunder]) for sound effects. - Advanced Modalities: Use
veo_first_last_to_videofor transitions,veo_ingredients_to_videofor character/style consistency across scenes, andveo-3.1-lite-generate-001for faster, 720p/1080p generation. - If a request times out, retry once. If it fails again, reduce the
durationparameter and inform the user. - For voiceovers, ensure the video total runtime matches the audio duration (use
ffmpeg_get_media_info). - The
bucketparameter must be a full GCS URI (gs://...).
More from googlecloudplatform/vertex-ai-creative-studio
agent-aware-cli
Guide for designing and implementing command-line interfaces (CLIs) that are equally usable by human developers and automated coding agents. Use when the user wants to build a CLI, apply CLI best practices, or use Go with Cobra and Viper.
2genmedia-voice-director
Expert in casting, directing, and generating expressive text-to-speech using Gemini TTS. Use this when the user needs virtual voice actor personas, expressive speech generation, or multiple variations of a voiceover (like "take 3 on the bounce").
2genmedia-audio-engineer
Expert in audio synthesis, music generation, and mixing. Use when creating podcasts, background scores, or multi-track audio layering using mcp-chirp3-go, mcp-lyria-go, mcp-gemini-go, mcp-nanobanana-go, and mcp-avtool-go.
1genmedia-video-editor
Expert in video composition, editing, and format conversion. Use when the user wants to generate high-quality video, overlay images on video, concatenate clips, create GIFs, or sync audio to video using mcp-avtool-go and mcp-veo-go.
1genmedia-image-artist
Expert in AI image generation and editing. Use when the user needs high-quality textures, character-consistent visuals, or image-to-image editing using mcp-nanobanana-go.
1