analyzing-wav

Installation
SKILL.md

WAV Audio Analysis Skill

Description

Analyze WAV audio files to debug audio generation pipelines. Provides statistical analysis, format validation, and quality metrics for diagnosing issues with generated speech.

Triggers: wav, audio, waveform, samples, amplitude, audio analysis, sound quality, audio debug

Analysis Capabilities

Basic Statistics

  • Sample count and duration
  • Min/max amplitude
  • Standard deviation (expected ~3000-8000 for speech)
  • Near-silent sample percentage

Quality Indicators

  • Zero crossing rate (speech typically 50-200 per 1000 samples)
  • Clipping detection (samples at ±32767)
  • NaN/Inf detection (if processing raw floats)
  • DC offset analysis

Format Validation

  • Sample rate verification (24kHz for Qwen3-Omni TTS)
  • Bit depth check
  • Channel count
  • RIFF header validation

Usage

To analyze a WAV file, provide the path and I'll run comprehensive diagnostics:

import numpy as np

with open("audio.wav", "rb") as f:
    header = f.read(44)
    data = f.read()

samples = np.frombuffer(data, dtype=np.int16)
print(f"Samples: {len(samples)}")
print(f"Duration: {len(samples)/24000:.2f} sec")
print(f"Min/Max: {samples.min()} / {samples.max()}")
print(f"Std dev: {np.std(samples):.1f}")

# Quality check
near_silent = np.sum(np.abs(samples) < 100)
print(f"Near-silent: {100*near_silent/len(samples):.1f}%")

# Zero crossings (voice activity indicator)
if len(samples) > 1000:
    zc = np.sum(np.diff(np.sign(samples[:1000])) != 0)
    print(f"Zero crossings (first 1000): {zc}")

Typical Values for Good Speech Audio

Metric Expected Range Meaning
Std dev 3000-8000 Audio energy level
Near-silent <5% Minimal silent padding
Zero crossings 50-200/1000 Voice frequency activity
Min/Max ±20000-32000 Healthy amplitude range

Common Issues

99% Near-Silent

  • Cause: NaN values converted to zeros
  • Fix: Check for numerical overflow in pipeline

Low Std Dev (<1000)

  • Cause: Values too quiet before output normalization
  • Fix: Check gain stages, ensure proper scaling

Constant Value Runs

  • Cause: Chunked processing with context overlap issues
  • Fix: Verify chunk stitching logic

Clipping (values at ±32767)

  • Cause: Overflow or missing tanh/clamp
  • Fix: Add output clamping before int16 conversion
Related skills

More from trevors/dot-claude

Installs
2
GitHub Stars
7
First Seen
Mar 21, 2026