slack-voice-interface

Installation
SKILL.md

Slack Voice Interface

How It Works

User sends voice clip in Slack
    |
    v
OpenClaw transcribes automatically (built-in)
    |
    v
NetClaw processes with full skill set
(pyATS, NetBox, ServiceNow, all 40 MCP servers)
    |
    v
python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" text_to_speech → MP3 file
    |
    v
Upload MP3 to Slack thread + post text response

Voice Response Workflow

Step 1: Process the question

Treat the transcribed voice message identically to a typed text message. Use the full NetClaw skill set — pyATS, NetBox, ServiceNow, etc.

Step 2: Generate voice response

After composing your text response, call text_to_speech:

python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" text_to_speech '{"text":"R1 has 3 OSPF neighbors, all in FULL state on Area 0...","voice":"en-US-GuyNeural"}'

This returns JSON with an output_path to the generated MP3 file.

To list available voices:

python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" list_voices '{"language":"en"}'

Step 3: Deliver both text and voice

Post the text response in the Slack thread AND upload the MP3 file:

:loud_speaker: Voice Response [MP3 audio file attached]

R1 has 3 OSPF neighbors, all in FULL state on Area 0:

  • 2.2.2.2 (R2) via Gi1 — FULL/DR
  • 3.3.3.3 (R3) via Gi2 — FULL/BDR

Always deliver text AND voice. Text is primary (searchable, accessible). Voice is supplementary.

Voice Selection

Voice Description
en-US-GuyNeural Professional male — default
en-US-JennyNeural Professional female
en-US-AriaNeural Conversational female
en-GB-RyanNeural British male

Users can request a voice change:

  • "Switch to a female voice" → use en-US-JennyNeural
  • "Use a British accent" → use en-GB-RyanNeural

Call list_voices to see all 300+ available voices.

Performance

Phase Latency
edge-tts synthesis 1-2 seconds
Slack MP3 upload < 1 second

Voice synthesis adds minimal overhead to the response time.

Fallback

If TTS fails, deliver the text response immediately. Do not block on voice.

Tips for Voice Responses

  • Keep it concise — under 100 words works best for spoken delivery
  • Avoid tables — describe data conversationally for voice
  • Spell out abbreviations — say "OSPF" not "O-S-P-F" (edge-tts handles this)
  • Use natural phrasing — the text will be read aloud, so write for the ear

GAIT Integration

Record voice interactions in the GAIT audit trail:

Input: Voice clip from @user (transcript: "What are your interfaces?")
Action: Queried R1 interfaces via pyATS
Output: 4 interfaces found — text + voice response delivered to Slack
Related skills

More from automateyournetwork/netclaw

Installs
2
GitHub Stars
485
First Seen
Mar 16, 2026