text-to-speech

SKILL.md

Text-to-Speech Skill ⭐ Flagship

Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.0.0
Last Updated: 2026-02-05
Author: Alex (Master Alex)
Status: ⭐ Flagship Skill - Core Alex capability


Why This is a Flagship Skill

Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:

  • Read documents aloud while you walk, drive, or rest your eyes
  • Proofread by ear - catch errors your eyes miss
  • Accessibility - full document access for vision-impaired users
  • Rehearsal - practice presentations with natural-sounding narration
  • Export knowledge - create MP3s for offline learning

Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.


User Experience

🎯 Quick Start: Read Any Document

Keyboard shortcut (fastest):

  1. Open any document in VS Code
  2. (Optional) Select specific text to read only that portion
  3. Press Ctrl+Alt+R (Windows/Linux) or Cmd+Alt+R (macOS)
  4. Audio begins playing through the webview player

Command palette:

  • Ctrl+Shift+P β†’ "Alex: Read Aloud"

πŸ“Š Status Bar Feedback

The status bar shows real-time progress during TTS operations:

State Display Click Action
Connecting $(loading~spin) Connecting... -
Synthesizing $(loading~spin) Synthesizing... -
Streaming $(loading~spin) Receiving... 45KB -
Playing $(unmute) Playing 35% Stop
Paused $(unmute) Paused Stop

🎡 Webview Audio Player

A sleek panel opens with full playback controls:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Alex TTS Player                                    [Γ—] β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                         β”‚
β”‚  ▢️ ⏹️   ═══════════●══════════   1:23 / 4:56          β”‚
β”‚                                                         β”‚
β”‚  πŸ”Š ────────●────────                                   β”‚
β”‚                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Features:

  • Progress bar with scrubbing (click/drag to seek)
  • Play/Pause button - toggle playback
  • Stop button - ends playback and closes panel
  • Volume slider - adjust playback volume
  • Time display - current position / total duration
  • Auto-close - panel closes when playback ends

🎀 Voice Selection

Choose Alex's voice before reading:

  1. Ctrl+Shift+P β†’ "Alex: Read with Voice Selection"
  2. Quick pick appears:
Voice Character Best For
Default (GuyNeural) Professional, clear Technical docs, code review
Warm (ChristopherNeural) Friendly, conversational Tutorials, READMEs
British (RyanNeural) Authoritative Formal documents, presentations
Friendly (DavisNeural) Casual, approachable Chat logs, informal content
  1. Select voice β†’ reading begins immediately

πŸ’Ύ Save as MP3

Export any document to audio file:

  1. Ctrl+Shift+P β†’ "Alex: Save as Audio"
  2. Save dialog opens (default name based on document)
  3. Progress notification shows synthesis progress
  4. Success notification with options:
    • Open File - plays in default audio player
    • Open Folder - reveals in file explorer

Use cases:

  • Create podcasts from documentation
  • Generate audio for offline learning
  • Archive presentations as audio

⏹️ Stop Reading

Multiple ways to stop playback:

  • Click status bar (shows $(unmute) icon during playback)
  • Press Escape when reading
  • Click stop button in webview player
  • Close webview panel
  • Ctrl+Shift+P β†’ "Alex: Stop Reading"

πŸ“ Smart Markdown Processing

Alex automatically strips markdown formatting for natural speech:

You Write Alex Reads
# Heading "Heading." (pause)
**bold text** "bold text" (slight emphasis)
[link text](url) "link text"
`code` "code"
> blockquote "Quote: ..."
--- (long pause)

Symbol conversion:

Symbol Spoken As
~5 minutes "about 5 minutes"
50% "50 percent"
A β†’ B "A leads to B"
Β±5% "plus or minus 5 percent"

For Master Alex (Promotion Notes)

This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.

Version 2.0 Changes:

  • Native TypeScript implementation (no Python/MCP dependencies)
  • Direct WebSocket connection to Edge TTS endpoint
  • Webview-based audio player (cross-platform)
  • Integrated as VS Code commands
  • Status bar progress feedback

Why promote to Master:

  • Universal utility across all projects
  • Zero-cost implementation (uses free Edge TTS API)
  • No external dependencies (Python, MCP server)
  • Accessibility benefits for vision-impaired users
  • Integrated into VS Code extension

Dependencies (v2.0):

  • ws npm package (WebSocket client)
  • VS Code webview API (for audio playback)

Overview

Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.


Architecture (v2.0)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Alex VS Code Extension                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚  Commands:                                                   β”‚
β”‚  β€’ Alex: Read Aloud (Ctrl+Alt+R)                            β”‚
β”‚  β€’ Alex: Read with Voice Selection                          β”‚
β”‚  β€’ Alex: Save as Audio                                      β”‚
β”‚  β€’ Alex: Stop Reading                                       β”‚
β”‚                     β”‚                                        β”‚
β”‚                     β–Ό                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚           ttsService.ts                       β”‚            β”‚
β”‚  β”‚   Native WebSocket to Edge TTS               β”‚            β”‚
β”‚  β”‚   β€’ SSML generation                          β”‚            β”‚
β”‚  β”‚   β€’ Markdown stripping                       β”‚            β”‚
β”‚  β”‚   β€’ Progress callbacks                       β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                    β”‚                                         β”‚
β”‚                    β–Ό                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚           audioPlayer.ts                      β”‚            β”‚
β”‚  β”‚   Webview-based playback                     β”‚            β”‚
β”‚  β”‚   β€’ Cross-platform HTML5 Audio               β”‚            β”‚
β”‚  β”‚   β€’ Play/pause/stop controls                 β”‚            β”‚
β”‚  β”‚   β€’ Progress tracking                        β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ WebSocket (wss://)
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               Microsoft Edge TTS Endpoint                    β”‚
β”‚   wss://speech.platform.bing.com/consumer/speech/...        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ 400+ neural voices, 90+ languages                        β”‚
β”‚  β€’ Free, no API key required                                β”‚
β”‚  β€’ MP3 output (24kHz, 48kbps)                               β”‚
β”‚  β€’ SSML support for prosody control                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Alex Voice Presets

Preset Voice ID Character
Default en-US-GuyNeural Professional male, clear articulation
Warm en-US-ChristopherNeural Friendly, conversational
British en-GB-RyanNeural British accent, authoritative
Friendly en-US-DavisNeural Casual, approachable

Voice Selection Rationale

Alex's default voice (GuyNeural) was chosen for:

  • Clarity: Excellent pronunciation of technical terms
  • Neutrality: Not too formal, not too casual
  • Distinctiveness: Recognizable as "Alex's voice"
  • Consistency: Same voice across all platforms

VS Code Commands

Alex: Read Aloud

Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)

Reads the current selection or entire document aloud using Alex's default voice.

Behavior:

  • If text is selected, reads only the selection
  • If no selection, reads the entire document
  • Markdown files are stripped of formatting for natural speech
  • Progress shown in status bar
  • Click status bar to stop playback

Alex: Read with Voice Selection

Command: alex.readWithVoice

Quick pick to select a voice preset before reading.

Alex: Save as Audio

Command: alex.saveAsAudio

Generate and save speech to an MP3 file. Opens a save dialog for output location.

Alex: Stop Reading

Command: alex.stopReading
Keybinding: Escape (when reading)

Immediately stops current playback.


Implementation Details

Core Files (src/tts/)

File Purpose
ttsService.ts WebSocket connection, SSML generation, synthesis
audioPlayer.ts Webview panel, playback controls, system fallback
index.ts Module exports

Text Preprocessing

The prepareTextForSpeech() function strips markdown:

Markdown Speech Output
# Heading "Heading." (pause)
**bold** "bold" (emphasis via prosody)
*italic* "italic"
`code` "code"
[link]\(url\) "link"
- item "Item."
> quote "Quote: ..."
--- (long pause)

Code Block Handling

```python
def hello():
    print("Hello")

Becomes: "Python code block. Definition hello. Print hello. End code block."

### Symbol-to-Speech Transformations

Symbols are converted to natural speech equivalents:

| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min β†’ "about 2 minutes" |
| `&` | "and" | A & B β†’ "A and B" |
| `@` | "at" | user@email β†’ "user at email" |
| `%` | "percent" | 50% β†’ "50 percent" |
| `+` | "plus" | +10% β†’ "plus 10 percent" |
| `β†’` | "leads to" or "becomes" | A β†’ B β†’ "A becomes B" |
| `β€”` | (pause) | wordβ€”word β†’ "word (pause) word" |
| `#` | (context-dependent) | #1 β†’ "number 1"; ## β†’ (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 β†’ "x greater than 5" |
| `β‰₯` / `≀` | "greater than or equal" / "less than or equal" | |
| `Β΅` | "micro" | Β΅g β†’ "microgram" |
| `Β°` | "degrees" | 37Β°C β†’ "37 degrees celsius" |
| `Β±` | "plus or minus" | Β±5% β†’ "plus or minus 5 percent" |

**Design Principle**: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter.

---

## Installation (v2.0)

TTS v2 is built into the Alex VS Code extension. No separate installation required.

### Package Dependencies

The extension automatically includes:
- `ws` (WebSocket client for Edge TTS connection)
- `fs-extra` (file operations for audio saving)

### Verification

After extension update, verify TTS works:

1. Open any document
2. Press `Ctrl+Alt+R` (Windows/Linux) or `Cmd+Alt+R` (macOS)
3. Status bar should show "$(unmute) Synthesizing..."
4. Audio should play through webview panel

---

## Usage Patterns

### Read Current Document

Press Ctrl+Alt+R to read document aloud Select text first to read only selection


### Generate Audio File

Command Palette β†’ "Alex: Save as Audio" Choose output location β†’ MP3 saved


### Voice Customization

Command Palette β†’ "Alex: Read with Voice Selection" Choose: Default | Warm | British | Friendly


---

## Edge TTS Technical Reference

### WebSocket Endpoint

wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1 ?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 &ConnectionId=[UUID]


### Audio Format

- **Codec**: MP3
- **Sample Rate**: 24kHz
- **Bitrate**: 48kbps mono

### SSML Template

```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice name="en-US-GuyNeural">
    <prosody rate="+0%" pitch="+0Hz" volume="+0%">
      Text content here
    </prosody>
  </voice>
</speak>

Popular Voice IDs

Language Voice Style
en-US GuyNeural Professional male
en-US JennyNeural Professional female
en-US AriaNeural News anchor style
en-GB RyanNeural British male
en-GB SoniaNeural British female
en-AU WilliamNeural Australian male
en-IN NeerjaNeural Indian English

Accessibility Benefits

Use Case Benefit
Vision impaired Full document access via audio
Multitasking Review code while walking/driving
Learning Auditory reinforcement of reading
Proofreading Catch errors by hearing text
Long documents Listen during breaks

Version History

v2.0.0 (2026-02-06)

  • Native TypeScript implementation
  • Removed Python/MCP server dependencies
  • Webview-based cross-platform audio player
  • VS Code command integration
  • Status bar progress feedback

v1.1.0 (2026-02-05)

  • Added Alex voice presets
  • Enhanced markdown stripping
  • Symbol to speech conversion

v1.0.0 (2026-02-04)

  • Initial implementation via MCP server
  • Python edge-tts integration
  • Basic markdown support

Synapses

  • accessibility: Primary use case enabler
  • vscode-extension-patterns: Extension command patterns
  • markdown-mermaid: Source content processing
  • academic-research: Document reading for research projects
  • gamma-presentations: Audio playback of pitch content for rehearsal and delivery
  • project-management: Stakeholder pitch presentations generated as audio files

Future Enhancements

Feature Status Notes
Real-time streaming Planned Start playing before full generation
SSML support Planned Fine-grained prosody control
Section navigation Planned "Skip to next heading"
Bookmark resume Planned Resume from last position
Speed presets Planned 1x, 1.5x, 2x reading speeds
Weekly Installs
1
First Seen
6 days ago
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1