Text-to-Speech Skill ⭐ Flagship

Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.0.0
Last Updated: 2026-02-05
Author: Alex (Master Alex)
Status: ⭐ Flagship Skill - Core Alex capability

Why This is a Flagship Skill

Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:

Read documents aloud while you walk, drive, or rest your eyes
Proofread by ear - catch errors your eyes miss
Accessibility - full document access for vision-impaired users
Rehearsal - practice presentations with natural-sounding narration
Export knowledge - create MP3s for offline learning

Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.

User Experience

🎯 Quick Start: Read Any Document

Keyboard shortcut (fastest):

Open any document in VS Code
(Optional) Select specific text to read only that portion
Press Ctrl+Alt+R (Windows/Linux) or Cmd+Alt+R (macOS)
Audio begins playing through the webview player

Command palette:

Ctrl+Shift+P → "Alex: Read Aloud"

📊 Status Bar Feedback

The status bar shows real-time progress during TTS operations:

State	Display	Click Action
Connecting	`$(loading~spin) Connecting...`	-
Synthesizing	`$(loading~spin) Synthesizing...`	-
Streaming	`$(loading~spin) Receiving... 45KB`	-
Playing	`$(unmute) Playing 35%`	Stop
Paused	`$(unmute) Paused`	Stop

🎵 Webview Audio Player

A sleek panel opens with full playback controls:

┌─────────────────────────────────────────────────────────┐
│  Alex TTS Player                                    [×] │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ▶️ ⏹️   ═══════════●══════════   1:23 / 4:56          │
│                                                         │
│  🔊 ────────●────────                                   │
│                                                         │
└─────────────────────────────────────────────────────────┘

Features:

Progress bar with scrubbing (click/drag to seek)
Play/Pause button - toggle playback
Stop button - ends playback and closes panel
Volume slider - adjust playback volume
Time display - current position / total duration
Auto-close - panel closes when playback ends

🎤 Voice Selection

Choose Alex's voice before reading:

Ctrl+Shift+P → "Alex: Read with Voice Selection"
Quick pick appears:

Voice	Character	Best For
Default (GuyNeural)	Professional, clear	Technical docs, code review
Warm (ChristopherNeural)	Friendly, conversational	Tutorials, READMEs
British (RyanNeural)	Authoritative	Formal documents, presentations
Friendly (DavisNeural)	Casual, approachable	Chat logs, informal content

Select voice → reading begins immediately

💾 Save as MP3

Export any document to audio file:

Ctrl+Shift+P → "Alex: Save as Audio"
Save dialog opens (default name based on document)
Progress notification shows synthesis progress
Success notification with options:
- Open File - plays in default audio player
- Open Folder - reveals in file explorer

Use cases:

Create podcasts from documentation
Generate audio for offline learning
Archive presentations as audio

⏹️ Stop Reading

Multiple ways to stop playback:

Click status bar (shows $(unmute) icon during playback)
Press Escape when reading
Click stop button in webview player
Close webview panel
Ctrl+Shift+P → "Alex: Stop Reading"

📝 Smart Markdown Processing

Alex automatically strips markdown formatting for natural speech:

You Write	Alex Reads
`# Heading`	"Heading." (pause)
`bold text`	"bold text" (slight emphasis)
`[link text](url)`	"link text"
`code`	"code"
`> blockquote`	"Quote: ..."
`---`	(long pause)

Symbol conversion:

Symbol	Spoken As
`~5 minutes`	"about 5 minutes"
`50%`	"50 percent"
`A → B`	"A leads to B"
`±5%`	"plus or minus 5 percent"

For Master Alex (Promotion Notes)

This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.

Version 2.0 Changes:

Native TypeScript implementation (no Python/MCP dependencies)
Direct WebSocket connection to Edge TTS endpoint
Webview-based audio player (cross-platform)
Integrated as VS Code commands
Status bar progress feedback

Why promote to Master:

Universal utility across all projects
Zero-cost implementation (uses free Edge TTS API)
No external dependencies (Python, MCP server)
Accessibility benefits for vision-impaired users
Integrated into VS Code extension

Dependencies (v2.0):

ws npm package (WebSocket client)
VS Code webview API (for audio playback)

Overview

Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.

Architecture (v2.0)

┌─────────────────────────────────────────────────────────────┐
│                 Alex VS Code Extension                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Commands:                                                   │
│  • Alex: Read Aloud (Ctrl+Alt+R)                            │
│  • Alex: Read with Voice Selection                          │
│  • Alex: Save as Audio                                      │
│  • Alex: Stop Reading                                       │
│                     │                                        │
│                     ▼                                        │
│  ┌─────────────────────────────────────────────┐            │
│  │           ttsService.ts                       │            │
│  │   Native WebSocket to Edge TTS               │            │
│  │   • SSML generation                          │            │
│  │   • Markdown stripping                       │            │
│  │   • Progress callbacks                       │            │
│  └─────────────────┬───────────────────────────┘            │
│                    │                                         │
│                    ▼                                         │
│  ┌─────────────────────────────────────────────┐            │
│  │           audioPlayer.ts                      │            │
│  │   Webview-based playback                     │            │
│  │   • Cross-platform HTML5 Audio               │            │
│  │   • Play/pause/stop controls                 │            │
│  │   • Progress tracking                        │            │
│  └─────────────────────────────────────────────┘            │
│                                                              │
└──────────────────────┬──────────────────────────────────────┘
                       │ WebSocket (wss://)
                       ▼
┌─────────────────────────────────────────────────────────────┐
│               Microsoft Edge TTS Endpoint                    │
│   wss://speech.platform.bing.com/consumer/speech/...        │
├─────────────────────────────────────────────────────────────┤
│  • 400+ neural voices, 90+ languages                        │
│  • Free, no API key required                                │
│  • MP3 output (24kHz, 48kbps)                               │
│  • SSML support for prosody control                         │
└─────────────────────────────────────────────────────────────┘

Alex Voice Presets

Preset	Voice ID	Character
Default	en-US-GuyNeural	Professional male, clear articulation
Warm	en-US-ChristopherNeural	Friendly, conversational
British	en-GB-RyanNeural	British accent, authoritative
Friendly	en-US-DavisNeural	Casual, approachable

Voice Selection Rationale

Alex's default voice (GuyNeural) was chosen for:

Clarity: Excellent pronunciation of technical terms
Neutrality: Not too formal, not too casual
Distinctiveness: Recognizable as "Alex's voice"
Consistency: Same voice across all platforms

VS Code Commands

Alex: Read Aloud

Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)

Reads the current selection or entire document aloud using Alex's default voice.

Behavior:

If text is selected, reads only the selection
If no selection, reads the entire document
Markdown files are stripped of formatting for natural speech
Progress shown in status bar
Click status bar to stop playback

Alex: Read with Voice Selection

Command: alex.readWithVoice

Quick pick to select a voice preset before reading.

Alex: Save as Audio

Command: alex.saveAsAudio

Generate and save speech to an MP3 file. Opens a save dialog for output location.

Alex: Stop Reading

Command: alex.stopReading
Keybinding: Escape (when reading)

Immediately stops current playback.

Implementation Details

Core Files (src/tts/)

File	Purpose
`ttsService.ts`	WebSocket connection, SSML generation, synthesis
`audioPlayer.ts`	Webview panel, playback controls, system fallback
`index.ts`	Module exports

Text Preprocessing

The prepareTextForSpeech() function strips markdown:

Markdown	Speech Output
`# Heading`	"Heading." (pause)
`bold`	"bold" (emphasis via prosody)
`italic`	"italic"
`code`	"code"
`[link]$url$`	"link"
`- item`	"Item."
`> quote`	"Quote: ..."
`---`	(long pause)

Code Block Handling

```python
def hello():
    print("Hello")


Becomes: "Python code block. Definition hello. Print hello. End code block."

### Symbol-to-Speech Transformations

Symbols are converted to natural speech equivalents:

| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min → "about 2 minutes" |
| `&` | "and" | A & B → "A and B" |
| `@` | "at" | user@email → "user at email" |
| `%` | "percent" | 50% → "50 percent" |
| `+` | "plus" | +10% → "plus 10 percent" |
| `→` | "leads to" or "becomes" | A → B → "A becomes B" |
| `—` | (pause) | word—word → "word (pause) word" |
| `#` | (context-dependent) | #1 → "number 1"; ## → (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 → "x greater than 5" |
| `≥` / `≤` | "greater than or equal" / "less than or equal" | |
| `µ` | "micro" | µg → "microgram" |
| `°` | "degrees" | 37°C → "37 degrees celsius" |
| `±` | "plus or minus" | ±5% → "plus or minus 5 percent" |

**Design Principle**: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter.

---

## Installation (v2.0)

TTS v2 is built into the Alex VS Code extension. No separate installation required.

### Package Dependencies

The extension automatically includes:
- `ws` (WebSocket client for Edge TTS connection)
- `fs-extra` (file operations for audio saving)

### Verification

After extension update, verify TTS works:

1. Open any document
2. Press `Ctrl+Alt+R` (Windows/Linux) or `Cmd+Alt+R` (macOS)
3. Status bar should show "$(unmute) Synthesizing..."
4. Audio should play through webview panel

---

## Usage Patterns

### Read Current Document

Press Ctrl+Alt+R to read document aloud Select text first to read only selection


### Generate Audio File

Command Palette → "Alex: Save as Audio" Choose output location → MP3 saved


### Voice Customization

Command Palette → "Alex: Read with Voice Selection" Choose: Default | Warm | British | Friendly


---

## Edge TTS Technical Reference

### WebSocket Endpoint

wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1 ?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 &ConnectionId=[UUID]


### Audio Format

- **Codec**: MP3
- **Sample Rate**: 24kHz
- **Bitrate**: 48kbps mono

### SSML Template

```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice name="en-US-GuyNeural">
    <prosody rate="+0%" pitch="+0Hz" volume="+0%">
      Text content here
    </prosody>
  </voice>
</speak>

Popular Voice IDs

Language	Voice	Style
en-US	GuyNeural	Professional male
en-US	JennyNeural	Professional female
en-US	AriaNeural	News anchor style
en-GB	RyanNeural	British male
en-GB	SoniaNeural	British female
en-AU	WilliamNeural	Australian male
en-IN	NeerjaNeural	Indian English

Accessibility Benefits

Use Case	Benefit
Vision impaired	Full document access via audio
Multitasking	Review code while walking/driving
Learning	Auditory reinforcement of reading
Proofreading	Catch errors by hearing text
Long documents	Listen during breaks

Version History

v2.0.0 (2026-02-06)

Native TypeScript implementation
Removed Python/MCP server dependencies
Webview-based cross-platform audio player
VS Code command integration
Status bar progress feedback

v1.1.0 (2026-02-05)

Added Alex voice presets
Enhanced markdown stripping
Symbol to speech conversion

v1.0.0 (2026-02-04)

Initial implementation via MCP server
Python edge-tts integration
Basic markdown support

Synapses

accessibility: Primary use case enabler
vscode-extension-patterns: Extension command patterns
markdown-mermaid: Source content processing
academic-research: Document reading for research projects
gamma-presentations: Audio playback of pitch content for rehearsal and delivery
project-management: Stakeholder pitch presentations generated as audio files

Future Enhancements

Feature	Status	Notes
Real-time streaming	Planned	Start playing before full generation
SSML support	Planned	Fine-grained prosody control
Section navigation	Planned	"Skip to next heading"
Bookmark resume	Planned	Resume from last position
Speed presets	Planned	1x, 1.5x, 2x reading speeds

text-to-speech