pipeline-test-runner
Pipeline Test Runner
Purpose
Validate that generated pipeline skills actually work by running them against real targets. Chain validation (done by chain-composer and scripts/artifact-utils.py validate-chain) checks type compatibility between steps. This skill checks execution -- does the skill produce valid artifacts when given real input? The distinction matters because a chain can be type-valid but produce empty content, crash on domain-specific inputs, or timeout due to overly complex research phases.
This skill is Phase 5 of the pipeline orchestrator's 7-phase flow. Its output feeds directly into pipeline-retro (Phase 6), which traces failures back to the generator using the Three-Layer Pattern.
Operator Context
Hardcoded Behaviors (Always Apply)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md files before execution. Project instructions override default skill behaviors.
- Pipeline Spec Required: Input MUST include the Pipeline Spec JSON (same spec consumed by
pipeline-scaffolder). The spec defines what subdomains exist, what skills were generated, and what scripts/references each skill expects. WHY: Without the spec, the test runner doesn't know what to test or what "success" looks like. - Per-Subdomain Results: Every subdomain gets its own result (PASS/PARTIAL/FAIL/TIMEOUT). Never aggregate into a single pass/fail that hides individual failures. WHY: The retro skill (Phase 6) needs per-subdomain failure traces to fix the correct generator component. A blanket "FAIL" tells the retro nothing about which subdomain or which chain step broke.
- No Production Targets: Test against repo files, fixtures, or synthetic inputs only. Never invoke skills against live/external systems. WHY: Test runs happen during pipeline generation -- they must be safe, repeatable, and free of side effects.
- Artifact Validation via Script: Always use
scripts/artifact-utils.py validate-manifestfor manifest validation rather than manual JSON inspection. WHY: The script implements the canonical validation rules from the ADR. Manual checks will drift from the spec over time.
Default Behaviors (ON unless disabled)
- Communication Style: Report facts without self-congratulation. Show per-subdomain results table, not narrative descriptions.
More from notque/claude-code-toolkit
generate-claudemd
Generate project-specific CLAUDE.md from repo analysis.
12fish-shell-config
Fish shell configuration and PATH management.
12pptx-generator
PPTX presentation generation with visual QA: slides, pitch decks.
12codebase-overview
Systematic codebase exploration and architecture mapping.
10image-to-video
FFmpeg-based video creation from image and audio.
9data-analysis
Decision-first data analysis with statistical rigor gates.
9