AudioEditor
AudioEditor
AI-powered audio/video editing — transcription, intelligent cut detection, automated editing with crossfades, and optional cloud polish.
Customization
Before executing, check for user customizations at:
~/.claude/PAI/USER/SKILLCUSTOMIZATIONS/AudioEditor/
If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.
Voice Notification
You MUST send this notification BEFORE doing anything else when this skill is invoked.
-
Send voice notification:
curl -s -X POST http://localhost:31337/notify \ -H "Content-Type: application/json" \ -d '{"message": "Running the WORKFLOWNAME workflow in the AudioEditor skill to ACTION"}' \ > /dev/null 2>&1 & -
Output text notification:
Running the **WorkflowName** workflow in the **AudioEditor** skill to ACTION...
This is not optional. Execute this curl command immediately upon skill invocation.
Workflow Routing
| Workflow | Trigger | File |
|---|---|---|
| Clean | "clean audio", "edit audio", "remove filler words", "clean podcast", "remove ums", "cut dead air", "polish audio" | Workflows/Clean.md |
Pipeline Architecture
Audio Input
|
[Transcribe] Whisper word-level timestamps (insanely-fast-whisper on MPS)
|
[Analyze] Claude classifies each segment:
| KEEP / CUT_FILLER / CUT_FALSE_START / CUT_EDIT_MARKER / CUT_STUTTER / CUT_DEAD_AIR
| Distinguishes rhetorical emphasis from accidental repetition
|
[Edit] ffmpeg executes cuts:
| - 40ms qsin crossfades at every edit point
| - Room tone extraction and gap filling
| - Breath attenuation (50% volume, not removal)
|
[Polish] (optional) Cleanvoice API final pass:
- Mouth sound removal
- Remaining filler detection
- Loudness normalization
Output: cleaned MP3/WAV
Tools
| Tool | Command | Purpose |
|---|---|---|
| Transcribe | bun ${CLAUDE_SKILL_DIR}/Tools/Transcribe.ts <file> |
Word-level transcription via Whisper |
| Analyze | bun ${CLAUDE_SKILL_DIR}/Tools/Analyze.ts <transcript.json> |
LLM-powered edit classification |
| Edit | bun ${CLAUDE_SKILL_DIR}/Tools/Edit.ts <file> <edits.json> |
Execute cuts with crossfades + room tone |
| Polish | bun ${CLAUDE_SKILL_DIR}/Tools/Polish.ts <file> |
Cleanvoice API cloud polish |
| Pipeline | bun ${CLAUDE_SKILL_DIR}/Tools/Pipeline.ts <file> [--polish] |
Full end-to-end pipeline |
API Keys Required
| Service | Env Var | Where to Get |
|---|---|---|
| Anthropic (for analyze step) | ANTHROPIC_API_KEY |
Already set via Claude Code |
| Cleanvoice (for polish step, optional) | CLEANVOICE_API_KEY |
cleanvoice.ai Dashboard Settings API Key |
Examples
Example 1: Clean a podcast recording
User: "clean up the audio on this podcast file"
-> Invokes Clean workflow
-> Runs full pipeline: transcribe -> analyze -> edit
-> Outputs cleaned MP3 with filler words, stutters, and dead air removed
Example 2: Preview edits before applying
User: "show me what edits you'd make to this recording"
-> Invokes Clean workflow with --preview flag
-> Transcribes and analyzes, shows proposed edits without modifying audio
-> User reviews edit list, then runs again to apply
Example 3: Aggressive clean with cloud polish
User: "aggressively clean this audio and polish it"
-> Invokes Clean workflow with --aggressive --polish flags
-> Tighter thresholds for filler detection
-> Cleanvoice API pass for mouth sounds and normalization
Gotchas
- Transcription accuracy varies with audio quality. Background noise, multiple speakers, and accents reduce accuracy.
- Cut detection is heuristic-based. Always preview edits before committing — automated cuts can remove intentional pauses.
- Cloud polish uploads audio to external service. Confirm the user is okay with cloud processing for sensitive content.
Execution Log
After completing any workflow, append a single JSONL entry:
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","skill":"AudioEditor","workflow":"WORKFLOW_USED","input":"8_WORD_SUMMARY","status":"ok|error","duration_s":SECONDS}' >> ~/.claude/PAI/MEMORY/SKILLS/execution.jsonl
Replace WORKFLOW_USED with the workflow executed, 8_WORD_SUMMARY with a brief input description, and SECONDS with approximate wall-clock time. Log status: "error" if the workflow failed.
More from danielmiessler/personal_ai_infrastructure
osint
Structured OSINT investigations — people lookup, company intel, investment due diligence, entity/threat intel, domain recon, organization research using public sources with ethical authorization framework. USE WHEN OSINT, due diligence, background check, research person, company intel, investigate, company lookup, domain lookup, entity lookup, organization lookup, threat intel, discover OSINT sources.
261firstprinciples
Physics-based reasoning framework (Musk/Elon methodology) that deconstructs problems to irreducible fundamental truths rather than reasoning by analogy. Three-step structure: DECONSTRUCT (break to constituent parts and actual values), CHALLENGE (classify every element as hard constraint / soft constraint / unvalidated assumption — only physics is truly immutable), RECONSTRUCT (build optimal solution from fundamentals alone, ignoring inherited form). Outputs: constituent-parts breakdown, constraint classification table, and reconstructed solution with key insight. Three workflows: Deconstruct.md, Challenge.md, Reconstruct.md. Integrates with RedTeam (attack assumptions before deploying adversarial agents), Security (decompose threat model), Architecture (challenge design constraints), and Pentesters (decompose assumed security boundaries). Other skills invoke via: Challenge on all stated constraints → classify as hard/soft/assumption. Cross-domain synthesis: solutions from unrelated fields often apply once the fundamental truths are exposed. NOT FOR incident investigation and causal chains (use RootCauseAnalysis). NOT FOR structural feedback loops (use SystemsThinking). USE WHEN first principles, fundamental truths, challenge assumptions, is this a real constraint, rebuild from scratch, what are we actually paying for, what is this really made of, start over, physics first, question everything, reasoning by analogy, is this really necessary.
161documents
Read, write, convert, and analyze documents — routes to PDF, DOCX, XLSX, PPTX sub-skills for creation, editing, extraction, and format conversion. USE WHEN document, process file, create document, convert format, extract text, PDF, DOCX, XLSX, PPTX, Word, Excel, spreadsheet, PowerPoint, presentation, slides, consulting report, large PDF, merge PDF, fill form, tracked changes, redlining.
116redteam
Military-grade adversarial analysis that deploys 32 parallel expert agents (engineers, architects, pentesters, interns) to stress-test ideas, strategies, and plans — not systems or infrastructure. Two workflows: ParallelAnalysis (5-phase: decompose into 24 atomic claims → 32-agent parallel attack → synthesis → steelman → counter-argument, each 8 points) and AdversarialValidation (competing proposals synthesized into best solution). Context files: Philosophy.md (core principles, success criteria, agent types), Integration.md (how to combine with FirstPrinciples, Council, and other skills; output format). Targets arguments, not network vulnerabilities. Findings ranked by severity; goal is to strengthen, not destroy — weaknesses delivered with remediation paths. Collaborates with FirstPrinciples (decompose assumptions before attacking) and Council (Council debates to find paths; RedTeam attacks whatever survives). Also invoked internally by Ideate (TEST phase) and WorldThreatModel (horizon stress-testing). NOT FOR AI instruction set auditing (use BitterPillEngineering). NOT FOR network/system vulnerability testing (use a security assessment skill). USE WHEN red team, attack idea, counterarguments, critique, stress test, devil's advocate, find weaknesses, break this, poke holes, what could go wrong, strongest objection, adversarial validation, battle of bots.
115privateinvestigator
Ethical people-finding using 15 parallel research agents (45 search threads) across public records, social media, reverse lookups. Public data only, no pretexting. USE WHEN find person, locate, reconnect, people search, skip trace, reverse lookup, social media search, public records search, verify identity.
114council
Multi-agent collaborative debate that produces visible round-by-round transcripts with genuine intellectual friction. All council members are custom-composed via ComposeAgent (Agents skill) with domain expertise, unique voice, and personality tailored to the specific topic — never built-in generic types. ComposeAgent invoked as: bun run ~/.claude/skills/Agents/Tools/ComposeAgent.ts. Two workflows: DEBATE (3 rounds, full transcript + synthesis, parallel execution within rounds, 40-90 seconds total) and QUICK (1 round, fast perspective check). Context files: CouncilMembers.md (agent composition instructions), RoundStructure.md (three-round structure and timing), OutputFormat.md (transcript format templates). Agents are designed per debate topic to create real disagreement; 4-6 well-composed agents outperform 12 generic ones. Council is collaborative-adversarial (debate to find best path); for pure adversarial attack on an idea, use RedTeam instead. NOT FOR parallel task execution across agents (use Delegation skill). USE WHEN council, debate, multiple perspectives, weigh options, deliberate, get different views, multi-agent discussion, what would experts say, is there consensus, pros and cons from multiple angles.
113