analyzing-wav
WAV Audio Analysis Skill
Description
Analyze WAV audio files to debug audio generation pipelines. Provides statistical analysis, format validation, and quality metrics for diagnosing issues with generated speech.
Triggers: wav, audio, waveform, samples, amplitude, audio analysis, sound quality, audio debug
Analysis Capabilities
Basic Statistics
- Sample count and duration
- Min/max amplitude
- Standard deviation (expected ~3000-8000 for speech)
- Near-silent sample percentage
Quality Indicators
- Zero crossing rate (speech typically 50-200 per 1000 samples)
- Clipping detection (samples at ±32767)
- NaN/Inf detection (if processing raw floats)
- DC offset analysis
Format Validation
- Sample rate verification (24kHz for Qwen3-Omni TTS)
- Bit depth check
- Channel count
- RIFF header validation
Usage
To analyze a WAV file, provide the path and I'll run comprehensive diagnostics:
import numpy as np
with open("audio.wav", "rb") as f:
header = f.read(44)
data = f.read()
samples = np.frombuffer(data, dtype=np.int16)
print(f"Samples: {len(samples)}")
print(f"Duration: {len(samples)/24000:.2f} sec")
print(f"Min/Max: {samples.min()} / {samples.max()}")
print(f"Std dev: {np.std(samples):.1f}")
# Quality check
near_silent = np.sum(np.abs(samples) < 100)
print(f"Near-silent: {100*near_silent/len(samples):.1f}%")
# Zero crossings (voice activity indicator)
if len(samples) > 1000:
zc = np.sum(np.diff(np.sign(samples[:1000])) != 0)
print(f"Zero crossings (first 1000): {zc}")
Typical Values for Good Speech Audio
| Metric | Expected Range | Meaning |
|---|---|---|
| Std dev | 3000-8000 | Audio energy level |
| Near-silent | <5% | Minimal silent padding |
| Zero crossings | 50-200/1000 | Voice frequency activity |
| Min/Max | ±20000-32000 | Healthy amplitude range |
Common Issues
99% Near-Silent
- Cause: NaN values converted to zeros
- Fix: Check for numerical overflow in pipeline
Low Std Dev (<1000)
- Cause: Values too quiet before output normalization
- Fix: Check gain stages, ensure proper scaling
Constant Value Runs
- Cause: Chunked processing with context overlap issues
- Fix: Verify chunk stitching logic
Clipping (values at ±32767)
- Cause: Overflow or missing tanh/clamp
- Fix: Add output clamping before int16 conversion
More from trevors/dot-claude
jj-workflow
Jujutsu (jj) version control, load skill when hook output shows vcs=jj-colocated or vcs=jj in the system-reminder.
93notion-formatter
Format markdown content for Notion import with proper syntax for toggles, code blocks, and tables. Use when formatting responses for Notion, creating Notion-compatible documentation, or preparing markdown for Notion paste/import.
47using-jj
Advanced jj/jujutsu workflows — revsets, absorb, evolog, op restore/undo, immutable_heads bypass, divergent-change resolution, jj split, parallel jj new, conflict-after-rebase, force-push recovery. Contains non-obvious rules (e.g., always `-m` to avoid editor) that prevent broken workflows. Skip for simple commit/push/rebase.
45book-reader
Read and search digital books (PDF, EPUB, MOBI, TXT). Use when answering questions about a book, finding quotes or passages, navigating to specific pages or chapters, or extracting information from documents.
44svelte5
Svelte 5 syntax reference. Use when writing ANY Svelte component. Svelte 5 uses runes ($state, $derived, $effect, $props) instead of Svelte 4 patterns. Training data is heavily Svelte 4—this skill prevents outdated syntax.
39maintaining-claude-code
Create, validate, and improve Claude Code configuration — SKILL.md files, CLAUDE.md, rules, hooks, and settings.json. Use when creating a new skill, writing a SKILL.md, adding a hook, editing rules, auditing skill descriptions, checking config quality, debugging hook behavior, or deciding between skills vs rules vs CLAUDE.md. Also auto-loads when working in ~/.claude/ on skills, rules, hooks, or settings.
32