diagnostic-issue-resolver
Diagnostic Issue Resolver
Diagnose and fix common TTS + Telegram bot issues through systematic symptom collection, automated diagnostics, and targeted fixes.
Platform: macOS (Apple Silicon)
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
When to Use This Skill
- TTS audio is not playing or sounds wrong
- Telegram bot is not responding to messages
- Kokoro engine errors or timeouts
- Lock file appears stuck
- Audio plays twice (race condition)
- MLX Metal acceleration is not working
- Queue appears full or backed up
Requirements
- Access to
~/.claude/automation/claude-telegram-sync/(bot source) - Access to
~/.local/share/kokoro/(Kokoro engine) - Access to
~/.local/state/launchd-logs/telegram-bot/(launchd logs) - Access to
~/.claude/automation/claude-telegram-sync/logs/audit/(NDJSON audit)
Known Issue Table
| Issue | Likely Cause | Diagnostic | Fix |
|---|---|---|---|
| No audio output | Stale TTS lock | stat /tmp/kokoro-tts.lock |
rm -f /tmp/kokoro-tts.lock |
| Bot not responding | Process crashed | pgrep -la 'bun.*src/main.ts' |
Restart: cd ~/.claude/automation/claude-telegram-sync && bun --watch run src/main.ts |
| Kokoro timeout | First-run model load | Check ~/.cache/huggingface/ |
Wait for download, or re-run kokoro-install.sh --install |
| Queue full | Rapid-fire notifications | Check queue depth in audit log | Increase TTS_MAX_QUEUE_DEPTH in mise.toml or drain queue |
| Lock stuck forever | Heartbeat process died | stat /tmp/kokoro-tts.lock + pgrep -x afplay |
If lock stale >30s AND no audio process, rm lock |
| Slow MLX acceleration | Wrong Python or deps | python -c "from mlx_audio.tts.utils import load_model; print('MLX OK')" |
Reinstall via kokoro-install.sh --upgrade |
| Double audio playback | Lock race condition | Check for multiple afplay processes | Kill all: pkill -x afplay, then restart |
Workflow Phases
Phase 1: Symptom Collection
Use AskUserQuestion to understand what the user is experiencing. Key questions:
- What happened? (no audio, wrong audio, bot silent, error message)
- When did it start? (after upgrade, suddenly, always)
- What were you doing? (clipboard read, Telegram notification, manual TTS)
Phase 2: Automated Diagnostics
Based on symptoms, run the relevant subset of these checks:
# Lock state
ls -la /tmp/kokoro-tts.lock 2>/dev/null && stat -f "%Sm" /tmp/kokoro-tts.lock || echo "No lock file"
# Audio processes
pgrep -la afplay; pgrep -la say
# Bot process
pgrep -la 'bun.*src/main.ts'
# Kokoro health
~/.local/share/kokoro/.venv/bin/python -c "from mlx_audio.tts.utils import load_model; print('MLX-Audio OK')"
# Recent errors in audit log
tail -20 ~/.claude/automation/claude-telegram-sync/logs/audit/*.ndjson 2>/dev/null | grep -i error
# Recent bot console output
tail -50 /private/tmp/telegram-bot.log 2>/dev/null | grep -i -E '(error|fail|timeout)'
Phase 3: Root Cause Analysis
Map diagnostic output to the Known Issue Table above. Common patterns:
- Lock file exists + mtime > 30s ago + no afplay = stale lock
- No bot PID found = bot crashed
from mlx_audio.tts.utils import load_modelfails = MLX-Audio broken- Multiple afplay PIDs = race condition
Phase 4: Fix Application
Apply the targeted fix from the Known Issue Table. Always use the least disruptive fix first.
Phase 5: Verification
After applying the fix, verify the issue is resolved:
# Quick TTS test
~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py \
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0 \
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"
# Full health check
~/eon/cc-skills/plugins/tts-tg-sync/scripts/kokoro-install.sh --health
TodoWrite Task Templates
1. [Symptoms] Collect symptoms via AskUserQuestion
2. [Triage] Map symptoms to likely causes
3. [Lock] Check TTS lock state (mtime, PID, stale detection)
4. [Process] Check bot process and audio processes
5. [Kokoro] Verify Kokoro venv and MLX-Audio availability
6. [Logs] Check recent audit logs for errors
7. [Fix] Apply targeted fix for identified root cause
8. [Verify] Run health check to confirm resolution
Post-Change Checklist
- Root cause identified and documented
- Fix applied successfully
- Health check passes
- Test audio plays correctly
- No stale locks or orphan processes remain
Troubleshooting
This skill IS the troubleshooting skill. If the standard diagnostics do not identify the issue:
- Check the full bot console log:
cat /private/tmp/telegram-bot.log - Check all NDJSON audit logs:
ls -lt ~/.claude/automation/claude-telegram-sync/logs/audit/ - Check system audio:
afplay /System/Library/Sounds/Tink.aiff(if this fails, it is a macOS audio issue, not TTS) - Run a manual Kokoro generation outside the bot to isolate the problem
- If all else fails, do a full teardown and reinstall using
clean-component-removalthenfull-stack-bootstrap
Reference Documentation
- Common Issues -- Expanded diagnostic procedures for each known issue
- Lock Debugging -- Deep dive into the two-layer lock mechanism
- Evolution Log -- Change history for this skill
Post-Execution Reflection
After this skill completes, reflect before closing the task:
- Locate yourself. — Find this SKILL.md's canonical path (Glob for this skill's name) before editing. All corrections target THIS file and its sibling references/ — never other documentation.
- What failed? — Fix the instruction that caused it. If it could recur, add it as an anti-pattern.
- What worked better than expected? — Promote it to recommended practice. Document why.
- What drifted? — Any script, reference, or external dependency that no longer matches reality gets fixed now.
- Log it. — Every change gets an evolution-log entry with trigger, fix, and evidence.
Do NOT defer. The next invocation inherits whatever you leave behind.