slack-voice-interface

Installation

SKILL.md

Slack Voice Interface

How It Works

User sends voice clip in Slack
    |
    v
OpenClaw transcribes automatically (built-in)
    |
    v
NetClaw processes with full skill set
(pyATS, NetBox, ServiceNow, all 40 MCP servers)
    |
    v
python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" text_to_speech → MP3 file
    |
    v
Upload MP3 to Slack thread + post text response

Voice Response Workflow

Step 1: Process the question

Treat the transcribed voice message identically to a typed text message. Use the full NetClaw skill set — pyATS, NetBox, ServiceNow, etc.

Step 2: Generate voice response

After composing your text response, call text_to_speech:

python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" text_to_speech '{"text":"R1 has 3 OSPF neighbors, all in FULL state on Area 0...","voice":"en-US-GuyNeural"}'

This returns JSON with an output_path to the generated MP3 file.

To list available voices:

python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" list_voices '{"language":"en"}'

Step 3: Deliver both text and voice

Post the text response in the Slack thread AND upload the MP3 file:

:loud_speaker: Voice Response [MP3 audio file attached]

R1 has 3 OSPF neighbors, all in FULL state on Area 0:

2.2.2.2 (R2) via Gi1 — FULL/DR

3.3.3.3 (R3) via Gi2 — FULL/BDR

Always deliver text AND voice. Text is primary (searchable, accessible). Voice is supplementary.

Voice Selection

Voice	Description
en-US-GuyNeural	Professional male — default
en-US-JennyNeural	Professional female
en-US-AriaNeural	Conversational female
en-GB-RyanNeural	British male

Users can request a voice change:

"Switch to a female voice" → use en-US-JennyNeural
"Use a British accent" → use en-GB-RyanNeural

Call list_voices to see all 300+ available voices.

Performance

Phase	Latency
edge-tts synthesis	1-2 seconds
Slack MP3 upload	< 1 second

Voice synthesis adds minimal overhead to the response time.

Fallback

If TTS fails, deliver the text response immediately. Do not block on voice.

Tips for Voice Responses

Keep it concise — under 100 words works best for spoken delivery
Avoid tables — describe data conversationally for voice
Spell out abbreviations — say "OSPF" not "O-S-P-F" (edge-tts handles this)
Use natural phrasing — the text will be read aloud, so write for the ear

GAIT Integration

Record voice interactions in the GAIT audit trail:

Input: Voice clip from @user (transcript: "What are your interfaces?")
Action: Queried R1 interfaces via pyATS
Output: 4 interfaces found — text + voice response delivered to Slack

Related skills

More from automateyournetwork/netclaw

Installs

Repository

automateyournet…/netclaw

GitHub Stars

485

First Seen

Mar 16, 2026

Security Audits

Gen Agent Trust HubPass

SocketWarn

SnykWarn

slack-voice-interface

Slack Voice Interface

How It Works

Voice Response Workflow

Step 1: Process the question

Step 2: Generate voice response

Step 3: Deliver both text and voice

Voice Selection

Performance

Fallback

Tips for Voice Responses

GAIT Integration

More from automateyournetwork/netclaw

drawio-diagram

pyats-topology

aws-architecture-diagram

grafana-observability

pyats-health-check

aws-security-audit