text-to-speech
Text-to-Speech Skill ⭐ Flagship
Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.0.0
Last Updated: 2026-02-05
Author: Alex (Master Alex)
Status: ⭐ Flagship Skill - Core Alex capability
Why This is a Flagship Skill
Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:
- Read documents aloud while you walk, drive, or rest your eyes
- Proofread by ear - catch errors your eyes miss
- Accessibility - full document access for vision-impaired users
- Rehearsal - practice presentations with natural-sounding narration
- Export knowledge - create MP3s for offline learning
Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.
User Experience
🎯 Quick Start: Read Any Document
Keyboard shortcut (fastest):
- Open any document in VS Code
- (Optional) Select specific text to read only that portion
- Press
Ctrl+Alt+R(Windows/Linux) orCmd+Alt+R(macOS) - Audio begins playing through the webview player
Command palette:
Ctrl+Shift+P→ "Alex: Read Aloud"
📊 Status Bar Feedback
The status bar shows real-time progress during TTS operations:
| State | Display | Click Action |
|---|---|---|
| Connecting | $(loading~spin) Connecting... |
- |
| Synthesizing | $(loading~spin) Synthesizing... |
- |
| Streaming | $(loading~spin) Receiving... 45KB |
- |
| Playing | $(unmute) Playing 35% |
Stop |
| Paused | $(unmute) Paused |
Stop |
🎵 Webview Audio Player
A sleek panel opens with full playback controls:
┌─────────────────────────────────────────────────────────┐
│ Alex TTS Player [×] │
├─────────────────────────────────────────────────────────┤
│ │
│ ▶️ ⏹️ ═══════════●══════════ 1:23 / 4:56 │
│ │
│ 🔊 ────────●──────── │
│ │
└─────────────────────────────────────────────────────────┘
Features:
- Progress bar with scrubbing (click/drag to seek)
- Play/Pause button - toggle playback
- Stop button - ends playback and closes panel
- Volume slider - adjust playback volume
- Time display - current position / total duration
- Auto-close - panel closes when playback ends
🎤 Voice Selection
Choose Alex's voice before reading:
Ctrl+Shift+P→ "Alex: Read with Voice Selection"- Quick pick appears:
| Voice | Character | Best For |
|---|---|---|
| Default (GuyNeural) | Professional, clear | Technical docs, code review |
| Warm (ChristopherNeural) | Friendly, conversational | Tutorials, READMEs |
| British (RyanNeural) | Authoritative | Formal documents, presentations |
| Friendly (DavisNeural) | Casual, approachable | Chat logs, informal content |
- Select voice → reading begins immediately
💾 Save as MP3
Export any document to audio file:
Ctrl+Shift+P→ "Alex: Save as Audio"- Save dialog opens (default name based on document)
- Progress notification shows synthesis progress
- Success notification with options:
- Open File - plays in default audio player
- Open Folder - reveals in file explorer
Use cases:
- Create podcasts from documentation
- Generate audio for offline learning
- Archive presentations as audio
⏹️ Stop Reading
Multiple ways to stop playback:
- Click status bar (shows
$(unmute)icon during playback) - Press
Escapewhen reading - Click stop button in webview player
- Close webview panel
Ctrl+Shift+P→ "Alex: Stop Reading"
📝 Smart Markdown Processing
Alex automatically strips markdown formatting for natural speech:
| You Write | Alex Reads |
|---|---|
# Heading |
"Heading." (pause) |
**bold text** |
"bold text" (slight emphasis) |
[link text](url) |
"link text" |
`code` |
"code" |
> blockquote |
"Quote: ..." |
--- |
(long pause) |
Symbol conversion:
| Symbol | Spoken As |
|---|---|
~5 minutes |
"about 5 minutes" |
50% |
"50 percent" |
A → B |
"A leads to B" |
±5% |
"plus or minus 5 percent" |
For Master Alex (Promotion Notes)
This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.
Version 2.0 Changes:
- Native TypeScript implementation (no Python/MCP dependencies)
- Direct WebSocket connection to Edge TTS endpoint
- Webview-based audio player (cross-platform)
- Integrated as VS Code commands
- Status bar progress feedback
Why promote to Master:
- Universal utility across all projects
- Zero-cost implementation (uses free Edge TTS API)
- No external dependencies (Python, MCP server)
- Accessibility benefits for vision-impaired users
- Integrated into VS Code extension
Dependencies (v2.0):
wsnpm package (WebSocket client)- VS Code webview API (for audio playback)
Overview
Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.
Architecture (v2.0)
┌─────────────────────────────────────────────────────────────┐
│ Alex VS Code Extension │
├─────────────────────────────────────────────────────────────┤
│ │
│ Commands: │
│ • Alex: Read Aloud (Ctrl+Alt+R) │
│ • Alex: Read with Voice Selection │
│ • Alex: Save as Audio │
│ • Alex: Stop Reading │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ ttsService.ts │ │
│ │ Native WebSocket to Edge TTS │ │
│ │ • SSML generation │ │
│ │ • Markdown stripping │ │
│ │ • Progress callbacks │ │
│ └─────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ audioPlayer.ts │ │
│ │ Webview-based playback │ │
│ │ • Cross-platform HTML5 Audio │ │
│ │ • Play/pause/stop controls │ │
│ │ • Progress tracking │ │
│ └─────────────────────────────────────────────┘ │
│ │
└──────────────────────┬──────────────────────────────────────┘
│ WebSocket (wss://)
▼
┌─────────────────────────────────────────────────────────────┐
│ Microsoft Edge TTS Endpoint │
│ wss://speech.platform.bing.com/consumer/speech/... │
├─────────────────────────────────────────────────────────────┤
│ • 400+ neural voices, 90+ languages │
│ • Free, no API key required │
│ • MP3 output (24kHz, 48kbps) │
│ • SSML support for prosody control │
└─────────────────────────────────────────────────────────────┘
Alex Voice Presets
| Preset | Voice ID | Character |
|---|---|---|
| Default | en-US-GuyNeural | Professional male, clear articulation |
| Warm | en-US-ChristopherNeural | Friendly, conversational |
| British | en-GB-RyanNeural | British accent, authoritative |
| Friendly | en-US-DavisNeural | Casual, approachable |
Voice Selection Rationale
Alex's default voice (GuyNeural) was chosen for:
- Clarity: Excellent pronunciation of technical terms
- Neutrality: Not too formal, not too casual
- Distinctiveness: Recognizable as "Alex's voice"
- Consistency: Same voice across all platforms
VS Code Commands
Alex: Read Aloud
Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)
Reads the current selection or entire document aloud using Alex's default voice.
Behavior:
- If text is selected, reads only the selection
- If no selection, reads the entire document
- Markdown files are stripped of formatting for natural speech
- Progress shown in status bar
- Click status bar to stop playback
Alex: Read with Voice Selection
Command: alex.readWithVoice
Quick pick to select a voice preset before reading.
Alex: Save as Audio
Command: alex.saveAsAudio
Generate and save speech to an MP3 file. Opens a save dialog for output location.
Alex: Stop Reading
Command: alex.stopReading
Keybinding: Escape (when reading)
Immediately stops current playback.
Implementation Details
Core Files (src/tts/)
| File | Purpose |
|---|---|
ttsService.ts |
WebSocket connection, SSML generation, synthesis |
audioPlayer.ts |
Webview panel, playback controls, system fallback |
index.ts |
Module exports |
Text Preprocessing
The prepareTextForSpeech() function strips markdown:
| Markdown | Speech Output |
|---|---|
# Heading |
"Heading." (pause) |
**bold** |
"bold" (emphasis via prosody) |
*italic* |
"italic" |
`code` |
"code" |
[link]\(url\) |
"link" |
- item |
"Item." |
> quote |
"Quote: ..." |
--- |
(long pause) |
Code Block Handling
```python
def hello():
print("Hello")
Becomes: "Python code block. Definition hello. Print hello. End code block."
### Symbol-to-Speech Transformations
Symbols are converted to natural speech equivalents:
| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min → "about 2 minutes" |
| `&` | "and" | A & B → "A and B" |
| `@` | "at" | user@email → "user at email" |
| `%` | "percent" | 50% → "50 percent" |
| `+` | "plus" | +10% → "plus 10 percent" |
| `→` | "leads to" or "becomes" | A → B → "A becomes B" |
| `—` | (pause) | word—word → "word (pause) word" |
| `#` | (context-dependent) | #1 → "number 1"; ## → (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 → "x greater than 5" |
| `≥` / `≤` | "greater than or equal" / "less than or equal" | |
| `µ` | "micro" | µg → "microgram" |
| `°` | "degrees" | 37°C → "37 degrees celsius" |
| `±` | "plus or minus" | ±5% → "plus or minus 5 percent" |
**Design Principle**: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter.
---
## Installation (v2.0)
TTS v2 is built into the Alex VS Code extension. No separate installation required.
### Package Dependencies
The extension automatically includes:
- `ws` (WebSocket client for Edge TTS connection)
- `fs-extra` (file operations for audio saving)
### Verification
After extension update, verify TTS works:
1. Open any document
2. Press `Ctrl+Alt+R` (Windows/Linux) or `Cmd+Alt+R` (macOS)
3. Status bar should show "$(unmute) Synthesizing..."
4. Audio should play through webview panel
---
## Usage Patterns
### Read Current Document
Press Ctrl+Alt+R to read document aloud Select text first to read only selection
### Generate Audio File
Command Palette → "Alex: Save as Audio" Choose output location → MP3 saved
### Voice Customization
Command Palette → "Alex: Read with Voice Selection" Choose: Default | Warm | British | Friendly
---
## Edge TTS Technical Reference
### WebSocket Endpoint
wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1 ?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 &ConnectionId=[UUID]
### Audio Format
- **Codec**: MP3
- **Sample Rate**: 24kHz
- **Bitrate**: 48kbps mono
### SSML Template
```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-GuyNeural">
<prosody rate="+0%" pitch="+0Hz" volume="+0%">
Text content here
</prosody>
</voice>
</speak>
Popular Voice IDs
| Language | Voice | Style |
|---|---|---|
| en-US | GuyNeural | Professional male |
| en-US | JennyNeural | Professional female |
| en-US | AriaNeural | News anchor style |
| en-GB | RyanNeural | British male |
| en-GB | SoniaNeural | British female |
| en-AU | WilliamNeural | Australian male |
| en-IN | NeerjaNeural | Indian English |
Accessibility Benefits
| Use Case | Benefit |
|---|---|
| Vision impaired | Full document access via audio |
| Multitasking | Review code while walking/driving |
| Learning | Auditory reinforcement of reading |
| Proofreading | Catch errors by hearing text |
| Long documents | Listen during breaks |
Version History
v2.0.0 (2026-02-06)
- Native TypeScript implementation
- Removed Python/MCP server dependencies
- Webview-based cross-platform audio player
- VS Code command integration
- Status bar progress feedback
v1.1.0 (2026-02-05)
- Added Alex voice presets
- Enhanced markdown stripping
- Symbol to speech conversion
v1.0.0 (2026-02-04)
- Initial implementation via MCP server
- Python edge-tts integration
- Basic markdown support
Synapses
- accessibility: Primary use case enabler
- vscode-extension-patterns: Extension command patterns
- markdown-mermaid: Source content processing
- academic-research: Document reading for research projects
- gamma-presentations: Audio playback of pitch content for rehearsal and delivery
- project-management: Stakeholder pitch presentations generated as audio files
Future Enhancements
| Feature | Status | Notes |
|---|---|---|
| Real-time streaming | Planned | Start playing before full generation |
| SSML support | Planned | Fine-grained prosody control |
| Section navigation | Planned | "Skip to next heading" |
| Bookmark resume | Planned | Resume from last position |
| Speed presets | Planned | 1x, 1.5x, 2x reading speeds |
More from fabioc-aloha/windowswidget
prompt engineering skill
Craft effective prompts that get the best results from language models.
3socratic questioning skill
Help users discover answers, don't just deliver them.
1academic research skill
Patterns for thesis writing, dissertations, research papers, literature reviews, and scholarly work.
1work-life balance skill
Detect burnout signals and proactively support sustainable productivity.
1grant writing skill
Translate research vision into funded reality.
1anti-hallucination skill
Recognize and prevent confabulation — when you don't know, say so.
1