tts-script-writer

Installation

SKILL.md

TTS Script Writer

Use this skill when the user wants a solo voiceover script that will be narrated by AI text-to-speech (ElevenLabs, etc.). It ensures the script is acoustically clean (no mispronounced numbers or symbols), expressive (via audio tags and punctuation), and structurally sound for TTS generation.

Golden rule: Write for the ear, not the eye. If it looks fine on paper but sounds wrong when spoken, it's wrong.

1. Use When

User says: "write a voiceover script", "narration script", "TTS script"
User mentions: "ElevenLabs script", "AI voice script", "text to speech script"
Content types: explainer videos, tutorials, audiobooks, podcast segments, e-learning modules, product demos, guided meditations, announcements
Any single-speaker script fed to a TTS engine

2. Script Structure Templates

A. Explainer / Tutorial (2-5 min)

Hook (0:00-0:15): One sentence that promises value or provokes curiosity.
Context (0:15-0:45): Why this matters. One paragraph max.
Core Concepts (0:45-3:00): 2-4 sections. One idea per section.
- Concept → Example → Takeaway
Practical Application (3:00-4:00): Step-by-step or demo narration.
Wrap / CTA (4:00-5:00): Summary + next step. No generic "thanks for watching."

B. Audiobook / Long-Form Narration (5+ min)

Scene Setting: Establish mood with descriptive language and audio tags.
Pacing Variation: Alternate between action (fast, short sentences) and reflection (slower, longer clauses with ellipses).
Character Voices (if applicable): Use audio tags to shift delivery: [whispers] for secrets, [angry] for conflict, [sad] for loss.
Chapter Breaks: Use <break time="2.0s" /> (v2/v2.5) or [long pause] (v3) between scenes.

C. Podcast Intro / Promo (30-60s)

Identity Line (0:00-0:05): Show name + host name. Confident, direct.
Episode Tease (0:05-0:25): What's in this episode. One compelling fact.
Value Proposition (0:25-0:45): Why the listener should stay.
Call to Action (0:45-0:60): Subscribe, follow, or visit. Specific, not generic.

D. Product Demo / Announcement (1-3 min)

Problem Statement (0:00-0:20): The pain point. Relatable language.
Solution Reveal (0:20-0:40): Product name + one-sentence value prop.
Feature Walkthrough (0:40-2:00): 3 features max. Benefits, not specs.
Proof / Social (2:00-2:30): One testimonial or metric.
CTA (2:30-3:00): Exact action. "Go to [URL]." (normalized for speech)

3. TTS Text Normalization (Non-Negotiable)

TTS models mispronounce numbers, symbols, dates, and abbreviations. Normalize ALL of these in the script before adding audio tags.

Normalization Table

Raw Input	Spoken Form	Example
`$42.50`	forty-two dollars and fifty cents	`$99.99` → ninety-nine dollars and ninety-nine cents
`£1,001.32`	one thousand and one pounds and thirty-two pence
`€100`	one hundred euros
`¥1000`	one thousand yen
`1234`	one thousand two hundred thirty-four	Expand all bare numbers > 20
`3.14`	three point one four
`555-555-5555`	five five five, five five five, five five five five	Phone numbers digit-by-digit
`2nd`	second	All ordinals
`XIV`	fourteen	Roman numerals ("the fourteenth" if a title)
`⅔`	two-thirds
`Dr.`	Doctor	Expand abbreviations
`Ave.`	Avenue
`St.`	Street	But saints: "St. Patrick" stays
`Ctrl + Z`	control z	Keyboard shortcuts
`100km`	one hundred kilometers	Unit abbreviations
`100%`	one hundred percent	Percentages
`elevenlabs.io/docs`	eleven labs dot io slash docs	URLs: spell out separators
`2024-01-01`	January first, two-thousand twenty-four	Dates
`14:30`	two thirty PM	Times
`01/02/2023`	January second, two-thousand twenty-three	Pick locale-appropriate form
`API`	A-P-I or "application programming interface"	Acronyms: spell out if uncommon
`HTML`	H-T-M-L or "hypertext markup language"
`npm`	N-P-M	Package managers as letters
`JSON`	J-son or "Jay-sawn"	Choose the pronunciation you want

Code & Technical Content

Code snippets: Read as spoken descriptions, not literal syntax.
- Bad: const x = useState(0)
- Good: "const x equals use state zero"
File paths: Spell separators. src/components/Button.tsx → "src slash components slash button dot tee-ess-ex"
Git commands: git commit -m "fix" → "git commit dash m fix"
Regex: /^[a-z]+$/i → "slash caret a through z plus dollar slash i"

4. Pronunciation Control

Phonetic Spelling

If a word is consistently mispronounced by your chosen voice, respell it phonetically in the script.

Example: "trapezIi" to emphasize the "ii"
Example: "Kubernetes" → "koo-ber-net-ees" if the voice struggles

Capitalization for Emphasis (v3)

Capital letters increase emphasis in Eleven v3:

"It was a VERY long day."
"This is NOT a drill."

Phoneme Tags (v2 / Flash v2 only)

For precise pronunciation of specific words, use SSML phoneme tags:

<phoneme alphabet="cmu-arpabet" ph="P R AH0 N AH0 N S IY EY1 SH AH0 N">
  pronunciation
</phoneme>

Note: Phoneme tags only work with Eleven Flash v2 and Eleven English v1. Multilingual v2 and v3 do NOT support phoneme tags.

5. Pause & Pacing Control

For Eleven v2 / v2.5 / Flash v2 Models

Use <break time="x.xs" /> for natural pauses up to 3 seconds.

"Hold on, let me think." <break time="1.5s" /> "Alright, I've got it."

Caution: Too many break tags in one generation causes instability (speedups, artifacts). Use 1-2 per short script, 2-3 per long script. Prefer punctuation pauses instead.

For Eleven v3 Models

v3 does NOT support <break>. Use:

Ellipses for hesitation: It was... a mistake.
Capitalization for emphasis: It was a VERY long day.
Punctuation for rhythm: commas, periods, dashes
Audio tags for breath/pause: [short pause], [long pause], [exhales], [sighs]

General Pacing Guidelines

Short sentences are more intelligible in TTS than complex compound sentences.
One idea per sentence — especially important for technical content.
Paragraph breaks create natural breathing room. Don't cram everything into one block.
Vary sentence length to avoid robotic cadence. Short. Then a bit longer. Then short again.

6. Audio Tags for Expressive Delivery

For Eleven v3 (or tag-aware TTS models), inject audio tags to control emotion and non-verbal delivery. Tags must describe auditory actions only.

Tag Categories

Emotional Directions: [happy], [sad], [excited], [angry], [whisper], [annoyed], [appalled], [thoughtful], [surprised], [sarcastic], [curious], [mischievously], [professional], [reassuring], [frustrated], [delighted], [nervously], [cautiously], [cheerfully], [quizzically], [elated], [deadpan], [dramatically], [dismissive], [impressed], [warmly]

Non-Verbal Sounds: [laughs], [laughs harder], [starts laughing], [chuckles], [giggles], [giggling], [groaning], [sighs], [exhales], [exhales sharply], [inhales deeply], [clears throat], [short pause], [long pause], [wheezing], [snorts], [gasps], [muttering], [happy gasp]

Sound Effects (use sparingly): [gunshot], [applause], [clapping], [explosion], [swallows], [gulps], [record scratch], [binary beeping]

Overall Direction (scene context): [football], [wrestling match], [auctioneer], [news broadcast], [podcast studio], [hacker den], [library], [classroom]

Tag Placement Rules

Before the line for global mood: [sarcastic] Oh, you thought this was easy?
After the line for reaction: Another framework. [sighs]
Inline for mid-sentence shifts: It was working [excited] until it wasn't. [sighs]
At the start of a scene for ambient context: [podcast studio] or [library]
Do NOT turn narrative descriptions into tags. If the text says "He laughed loudly," add a tag: He laughed loudly [chuckles].
Do NOT use non-auditory tags: [standing], [grinning], [pacing], [music]

Tag Density Guidelines

Content Type	Tag Count	Guidance
Explainer / Tutorial	4-8 tags	One tag per major section or emotional beat
Audiobook / Story	10-20 tags	Higher density for character emotion and scene shifts
Podcast Intro	2-4 tags	Confidence and energy tags for the hook
Product Demo	3-6 tags	Enthusiasm for features, professionalism for specs
Meditation / Calm	4-6 tags	Soft tags: `[whisper]`, `[softly]`, `[gently]`

7. Model Selection Quick Guide

Model	Best For	Break Tags	Audio Tags	Phoneme Tags
Eleven v3	Expressive narration, character voices	No	Yes	No
Multilingual v2	Natural speech, multiple languages	Yes	No	No
Flash v2.5	Low latency, real-time	No	No	No
Flash v2	Fast generation, English	Yes	No	Yes
English v1	Legacy English content	Yes	No	Yes

Recommendation: Use v3 for character-driven or expressive content. Use Multilingual v2 for natural, neutral narration in any language.

8. Output Format

Deliver the final script in this structure:

# [Title] — TTS Script

## Metadata
- **Content Type**: [Explainer / Audiobook / Podcast / Demo / etc.]
- **Target Duration**: [e.g., 3-5 minutes]
- **TTS Model**: [Eleven v3 / Multilingual v2 / Flash v2.5]
- **Voice**: [Voice name or description]
- **Audio Tags**: [count]

## Normalized Voiceover Script

[Paste the fully normalized, tag-enhanced script here.]

[Use paragraph breaks for natural breathing room.]

## Pronunciation Notes
- "React": ree-act (not ray-act)
- "Kubernetes": koo-ber-net-ees
- [Any other words that need explicit direction]

## Post-Production Notes (optional)
- Speed adjustment: [1.0x / 1.15x / 1.25x]
- Background music: [genre / tempo / volume relative to voice]

9. Anti-Patterns (NEVER Do)

Leaving raw symbols in script — $100, API, 2024-01-01, Ctrl+Z must be normalized.
Long run-on sentences — TTS drifts on complex clauses. Break them up.
No audio tags on v3 — Flat TTS sounds robotic on expressive models.
Non-auditory tags — [grinning], [pacing], [music] will be spoken or ignored.
Too many <break> tags — Causes instability (speedups, artifacts). Max 2-3.
Using <break> with v3 — v3 does not support SSML break tags.
Inconsistent pacing — Avoid monotone sentence length. Vary short and long.
Generic CTAs — "Thanks for watching" and "like and subscribe" are lazy. Write topic-specific closings.
Ignoring model capabilities — Don't use phoneme tags with v3 or Multilingual v2. They won't work.
Writing for the eye — Read the script aloud (or imagine it spoken) before delivering. If it sounds awkward, rewrite it.

10. Quality Checklist

Before delivering the script, verify:

Related skills

More from jarmen423/skills

Installs

Repository

jarmen423/skills

GitHub Stars

First Seen

Apr 23, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass