text-to-speech
Text-to-Speech Skill β Flagship
Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.0.0
Last Updated: 2026-02-05
Author: Alex (Master Alex)
Status: β Flagship Skill - Core Alex capability
Why This is a Flagship Skill
Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:
- Read documents aloud while you walk, drive, or rest your eyes
- Proofread by ear - catch errors your eyes miss
- Accessibility - full document access for vision-impaired users
- Rehearsal - practice presentations with natural-sounding narration
- Export knowledge - create MP3s for offline learning
Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.
User Experience
π― Quick Start: Read Any Document
Keyboard shortcut (fastest):
- Open any document in VS Code
- (Optional) Select specific text to read only that portion
- Press
Ctrl+Alt+R(Windows/Linux) orCmd+Alt+R(macOS) - Audio begins playing through the webview player
Command palette:
Ctrl+Shift+Pβ "Alex: Read Aloud"
π Status Bar Feedback
The status bar shows real-time progress during TTS operations:
| State | Display | Click Action |
|---|---|---|
| Connecting | $(loading~spin) Connecting... |
- |
| Synthesizing | $(loading~spin) Synthesizing... |
- |
| Streaming | $(loading~spin) Receiving... 45KB |
- |
| Playing | $(unmute) Playing 35% |
Stop |
| Paused | $(unmute) Paused |
Stop |
π΅ Webview Audio Player
A sleek panel opens with full playback controls:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Alex TTS Player [Γ] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βΆοΈ βΉοΈ ββββββββββββββββββββββ 1:23 / 4:56 β
β β
β π βββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Features:
- Progress bar with scrubbing (click/drag to seek)
- Play/Pause button - toggle playback
- Stop button - ends playback and closes panel
- Volume slider - adjust playback volume
- Time display - current position / total duration
- Auto-close - panel closes when playback ends
π€ Voice Selection
Choose Alex's voice before reading:
Ctrl+Shift+Pβ "Alex: Read with Voice Selection"- Quick pick appears:
| Voice | Character | Best For |
|---|---|---|
| Default (GuyNeural) | Professional, clear | Technical docs, code review |
| Warm (ChristopherNeural) | Friendly, conversational | Tutorials, READMEs |
| British (RyanNeural) | Authoritative | Formal documents, presentations |
| Friendly (DavisNeural) | Casual, approachable | Chat logs, informal content |
- Select voice β reading begins immediately
πΎ Save as MP3
Export any document to audio file:
Ctrl+Shift+Pβ "Alex: Save as Audio"- Save dialog opens (default name based on document)
- Progress notification shows synthesis progress
- Success notification with options:
- Open File - plays in default audio player
- Open Folder - reveals in file explorer
Use cases:
- Create podcasts from documentation
- Generate audio for offline learning
- Archive presentations as audio
βΉοΈ Stop Reading
Multiple ways to stop playback:
- Click status bar (shows
$(unmute)icon during playback) - Press
Escapewhen reading - Click stop button in webview player
- Close webview panel
Ctrl+Shift+Pβ "Alex: Stop Reading"
π Smart Markdown Processing
Alex automatically strips markdown formatting for natural speech:
| You Write | Alex Reads |
|---|---|
# Heading |
"Heading." (pause) |
**bold text** |
"bold text" (slight emphasis) |
[link text](url) |
"link text" |
`code` |
"code" |
> blockquote |
"Quote: ..." |
--- |
(long pause) |
Symbol conversion:
| Symbol | Spoken As |
|---|---|
~5 minutes |
"about 5 minutes" |
50% |
"50 percent" |
A β B |
"A leads to B" |
Β±5% |
"plus or minus 5 percent" |
For Master Alex (Promotion Notes)
This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.
Version 2.0 Changes:
- Native TypeScript implementation (no Python/MCP dependencies)
- Direct WebSocket connection to Edge TTS endpoint
- Webview-based audio player (cross-platform)
- Integrated as VS Code commands
- Status bar progress feedback
Why promote to Master:
- Universal utility across all projects
- Zero-cost implementation (uses free Edge TTS API)
- No external dependencies (Python, MCP server)
- Accessibility benefits for vision-impaired users
- Integrated into VS Code extension
Dependencies (v2.0):
wsnpm package (WebSocket client)- VS Code webview API (for audio playback)
Overview
Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.
Architecture (v2.0)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Alex VS Code Extension β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Commands: β
β β’ Alex: Read Aloud (Ctrl+Alt+R) β
β β’ Alex: Read with Voice Selection β
β β’ Alex: Save as Audio β
β β’ Alex: Stop Reading β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β ttsService.ts β β
β β Native WebSocket to Edge TTS β β
β β β’ SSML generation β β
β β β’ Markdown stripping β β
β β β’ Progress callbacks β β
β βββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β audioPlayer.ts β β
β β Webview-based playback β β
β β β’ Cross-platform HTML5 Audio β β
β β β’ Play/pause/stop controls β β
β β β’ Progress tracking β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β WebSocket (wss://)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Microsoft Edge TTS Endpoint β
β wss://speech.platform.bing.com/consumer/speech/... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β’ 400+ neural voices, 90+ languages β
β β’ Free, no API key required β
β β’ MP3 output (24kHz, 48kbps) β
β β’ SSML support for prosody control β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Alex Voice Presets
| Preset | Voice ID | Character |
|---|---|---|
| Default | en-US-GuyNeural | Professional male, clear articulation |
| Warm | en-US-ChristopherNeural | Friendly, conversational |
| British | en-GB-RyanNeural | British accent, authoritative |
| Friendly | en-US-DavisNeural | Casual, approachable |
Voice Selection Rationale
Alex's default voice (GuyNeural) was chosen for:
- Clarity: Excellent pronunciation of technical terms
- Neutrality: Not too formal, not too casual
- Distinctiveness: Recognizable as "Alex's voice"
- Consistency: Same voice across all platforms
VS Code Commands
Alex: Read Aloud
Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)
Reads the current selection or entire document aloud using Alex's default voice.
Behavior:
- If text is selected, reads only the selection
- If no selection, reads the entire document
- Markdown files are stripped of formatting for natural speech
- Progress shown in status bar
- Click status bar to stop playback
Alex: Read with Voice Selection
Command: alex.readWithVoice
Quick pick to select a voice preset before reading.
Alex: Save as Audio
Command: alex.saveAsAudio
Generate and save speech to an MP3 file. Opens a save dialog for output location.
Alex: Stop Reading
Command: alex.stopReading
Keybinding: Escape (when reading)
Immediately stops current playback.
Implementation Details
Core Files (src/tts/)
| File | Purpose |
|---|---|
ttsService.ts |
WebSocket connection, SSML generation, synthesis |
audioPlayer.ts |
Webview panel, playback controls, system fallback |
index.ts |
Module exports |
Text Preprocessing
The prepareTextForSpeech() function strips markdown:
| Markdown | Speech Output |
|---|---|
# Heading |
"Heading." (pause) |
**bold** |
"bold" (emphasis via prosody) |
*italic* |
"italic" |
`code` |
"code" |
[link]\(url\) |
"link" |
- item |
"Item." |
> quote |
"Quote: ..." |
--- |
(long pause) |
Code Block Handling
```python
def hello():
print("Hello")
Becomes: "Python code block. Definition hello. Print hello. End code block."
### Symbol-to-Speech Transformations
Symbols are converted to natural speech equivalents:
| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min β "about 2 minutes" |
| `&` | "and" | A & B β "A and B" |
| `@` | "at" | user@email β "user at email" |
| `%` | "percent" | 50% β "50 percent" |
| `+` | "plus" | +10% β "plus 10 percent" |
| `β` | "leads to" or "becomes" | A β B β "A becomes B" |
| `β` | (pause) | wordβword β "word (pause) word" |
| `#` | (context-dependent) | #1 β "number 1"; ## β (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 β "x greater than 5" |
| `β₯` / `β€` | "greater than or equal" / "less than or equal" | |
| `Β΅` | "micro" | Β΅g β "microgram" |
| `Β°` | "degrees" | 37Β°C β "37 degrees celsius" |
| `Β±` | "plus or minus" | Β±5% β "plus or minus 5 percent" |
**Design Principle**: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter.
---
## Installation (v2.0)
TTS v2 is built into the Alex VS Code extension. No separate installation required.
### Package Dependencies
The extension automatically includes:
- `ws` (WebSocket client for Edge TTS connection)
- `fs-extra` (file operations for audio saving)
### Verification
After extension update, verify TTS works:
1. Open any document
2. Press `Ctrl+Alt+R` (Windows/Linux) or `Cmd+Alt+R` (macOS)
3. Status bar should show "$(unmute) Synthesizing..."
4. Audio should play through webview panel
---
## Usage Patterns
### Read Current Document
Press Ctrl+Alt+R to read document aloud Select text first to read only selection
### Generate Audio File
Command Palette β "Alex: Save as Audio" Choose output location β MP3 saved
### Voice Customization
Command Palette β "Alex: Read with Voice Selection" Choose: Default | Warm | British | Friendly
---
## Edge TTS Technical Reference
### WebSocket Endpoint
wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1 ?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 &ConnectionId=[UUID]
### Audio Format
- **Codec**: MP3
- **Sample Rate**: 24kHz
- **Bitrate**: 48kbps mono
### SSML Template
```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-GuyNeural">
<prosody rate="+0%" pitch="+0Hz" volume="+0%">
Text content here
</prosody>
</voice>
</speak>
Popular Voice IDs
| Language | Voice | Style |
|---|---|---|
| en-US | GuyNeural | Professional male |
| en-US | JennyNeural | Professional female |
| en-US | AriaNeural | News anchor style |
| en-GB | RyanNeural | British male |
| en-GB | SoniaNeural | British female |
| en-AU | WilliamNeural | Australian male |
| en-IN | NeerjaNeural | Indian English |
Accessibility Benefits
| Use Case | Benefit |
|---|---|
| Vision impaired | Full document access via audio |
| Multitasking | Review code while walking/driving |
| Learning | Auditory reinforcement of reading |
| Proofreading | Catch errors by hearing text |
| Long documents | Listen during breaks |
Version History
v2.0.0 (2026-02-06)
- Native TypeScript implementation
- Removed Python/MCP server dependencies
- Webview-based cross-platform audio player
- VS Code command integration
- Status bar progress feedback
v1.1.0 (2026-02-05)
- Added Alex voice presets
- Enhanced markdown stripping
- Symbol to speech conversion
v1.0.0 (2026-02-04)
- Initial implementation via MCP server
- Python edge-tts integration
- Basic markdown support
Synapses
- accessibility: Primary use case enabler
- vscode-extension-patterns: Extension command patterns
- markdown-mermaid: Source content processing
- academic-research: Document reading for research projects
- gamma-presentations: Audio playback of pitch content for rehearsal and delivery
- project-management: Stakeholder pitch presentations generated as audio files
Future Enhancements
| Feature | Status | Notes |
|---|---|---|
| Real-time streaming | Planned | Start playing before full generation |
| SSML support | Planned | Fine-grained prosody control |
| Section navigation | Planned | "Skip to next heading" |
| Bookmark resume | Planned | Resume from last position |
| Speed presets | Planned | 1x, 1.5x, 2x reading speeds |