article-evaluation-pipeline
Article Evaluation Pipeline Skill
Operator Context
This skill operates as an operator for voice authenticity evaluation, configuring Claude's behavior for deterministic validation combined with wabi-sabi-aware analysis. It implements the Pipeline architectural pattern -- Fetch, Validate, Analyze, Report -- with Domain Intelligence embedded in voice authenticity classification.
Hardcoded Behaviors (Always Apply)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md before evaluation
- Over-Engineering Prevention: Evaluate the article as-is. No speculative corrections, no "while I'm here" rewrites
- Deterministic Validation Required: Always use
scripts/voice_validator.pyfor pattern matching. Never self-assess voice quality - Wabi-Sabi Awareness: Natural imperfections are FEATURES, not bugs. See
skills/shared-patterns/wabi-sabi-authenticity.md - Em-Dash Zero Tolerance: Em-dashes are an absolute prohibition, always flagged as errors
- No False Positives on Authenticity: Do NOT flag typos, run-ons, fragments, self-corrections, or trailing thoughts as errors
- Artifact Persistence: Save evaluation report to file, not just context
Default Behaviors (ON unless disabled)
- Full Pipeline Execution: Run all 4 phases: FETCH -> VALIDATE -> ANALYZE -> REPORT
- Voice Auto-Detection: Detect voice based on source context or explicit
--voiceflag - Wabi-Sabi Report Section: Include dedicated analysis of intentional imperfections
- Banned Pattern Check: Run zero-tolerance check for AI tells alongside voice validation
- Line Number Attribution: Report all findings with specific line numbers
- Artifact Saving: Save fetched content to
/tmp/article-evaluation-[timestamp].md
Optional Behaviors (OFF unless enabled)
- Quick Mode: Skip wabi-sabi analysis, only run validators (
--quick) - Fix Suggestions: Generate revision suggestions for failing content (
--suggest-fixes) - Specific Voice Override: Force voice profile instead of auto-detect (
--voice {name})
What This Skill CAN Do
- Evaluate articles for voice authenticity through deterministic validation
- Classify imperfections as wabi-sabi markers (keep) vs actual violations (fix)
- Run banned pattern checks with zero tolerance for AI tells
- Generate comprehensive evaluation reports with line-level attribution
- Auto-detect which voice profile to validate against
What This Skill CANNOT Do
- Write or generate articles (use voice-orchestrator or research-to-article instead)
- Edit or fix articles (use anti-ai-editor instead)
- Create voice profiles (use voice-calibrator instead)
- Skip the validation phase and self-assess quality
- Run without
scripts/voice_validator.pyavailable
Instructions
Phase 1: FETCH
Goal: Obtain article content in a format suitable for validation.
Step 1: Identify source
Determine whether input is a URL or local file path.
Step 2: Fetch content
For URLs: Use WebFetch or curl to retrieve content, extract article body as markdown. For local files: Read the file directly.
Step 3: Save artifact
Save content to /tmp/article-evaluation-[timestamp].md for subsequent phases.
Step 4: Detect voice
Detect voice profile from source or context:
- Known domain/path -> mapped voice profile
- Unknown -> require explicit
--voiceflag
Gate: Article content saved to temp file AND voice profile identified. Proceed only when gate passes.
Phase 2: VALIDATE
Goal: Run deterministic validation against voice profile and banned patterns.
Step 1: Voice pattern validation
python3 $HOME/claude-code-toolkit/scripts/voice_validator.py validate \
--content /tmp/article-evaluation.md \
--voice [voice-name] \
--format json
Pass criteria: Score >= 60, zero hard errors.
Step 2: Banned pattern check
python3 $HOME/claude-code-toolkit/scripts/voice_validator.py check-banned \
--content /tmp/article-evaluation.md
Pass criteria: Score = 100 (no banned patterns found).
Step 3: Record results
Capture both scores, all errors, and all warnings with line numbers.
Gate: Both validation runs complete with captured output. Proceed only when gate passes.
Phase 3: ANALYZE (Wabi-Sabi)
Goal: Classify imperfections as authentic markers or actual violations.
Step 1: Scan for imperfections
Review content for all deviations from "perfect" writing: typos, run-ons, fragments, self-corrections, trailing thoughts, casual contractions.
Step 2: Classify each finding
For each imperfection found, classify as:
- WABI-SABI (KEEP): Intentional imperfection matching the writer's authentic patterns
- ERROR (FIX): Actual voice violation or banned pattern
- WARNING (REVIEW): Minor rhythm or pattern issue worth noting
Use references/wabi-sabi-classification.md for the full classification guide and decision tree.
Step 3: Check for suspicious perfection
Zero wabi-sabi markers is itself a red flag. If no markers found, note this as suspicious -- authentic writing always contains imperfections.
Gate: All imperfections classified with line numbers and rationale. Proceed only when gate passes.
Phase 4: REPORT
Goal: Generate comprehensive evaluation report.
Step 1: Compile findings
Aggregate validation scores, wabi-sabi markers, errors, and warnings into the report structure defined in references/report-template.md.
Step 2: Determine verdict
| Verdict | Conditions |
|---|---|
| AUTHENTIC | Voice >= 60, banned = 100, wabi-sabi markers present |
| NEEDS WORK | Voice >= 60, banned < 100 (minor violations) |
| FAILED | Voice < 60, or major banned pattern violations |
Step 3: Write recommendations
For NEEDS WORK or FAILED verdicts, list specific items to fix with line numbers. For AUTHENTIC, note what makes it work.
Step 4: Output report
Display report to user and save to file if requested.
Gate: Complete report generated with verdict, scores, and recommendations. Evaluation complete.
Examples
Example 1: URL Evaluation
User says: "Evaluate this article https://example.com/posts/my-article/" Actions:
- Fetch article content, save to temp file, detect voice from context (FETCH)
- Run voice_validator.py validate + check-banned (VALIDATE)
- Classify imperfections as wabi-sabi or violations (ANALYZE)
- Generate report with verdict (REPORT) Result: Comprehensive evaluation with AUTHENTIC/NEEDS WORK/FAILED verdict
Example 2: Local File Quick Check
User says: "Quick check on ~/myblog/content/posts/draft.md" Actions:
- Read local file, save to temp, detect voice from context (FETCH)
- Run both validators (VALIDATE)
- Skip wabi-sabi analysis (quick mode) (ANALYZE skipped)
- Generate abbreviated report with scores only (REPORT) Result: Fast pass/fail with scores, no wabi-sabi breakdown
Error Handling
Error: "Voice validator script not found"
Cause: scripts/voice_validator.py not at expected path or not executable
Solution:
- Verify path:
ls $HOME/claude-code-toolkit/scripts/voice_validator.py - Check permissions:
chmod +xif needed - If missing, cannot proceed -- deterministic validation is non-negotiable
Error: "Cannot determine voice profile"
Cause: Source does not match any known site mapping and no --voice flag provided
Solution:
- Ask user which voice to validate against
- Use
--voice {name}explicit flag - Do NOT guess -- wrong profile produces meaningless scores
Error: "Article content empty or too short"
Cause: WebFetch failed, URL is paywalled, or file path incorrect Solution:
- Verify URL is accessible (check for paywalls, auth walls)
- Try alternative fetch method (curl vs WebFetch)
- Ask user to provide content directly if URL inaccessible
Anti-Patterns
Anti-Pattern 1: Flagging Wabi-Sabi as Errors
What it looks like: Reporting typos, run-ons, and fragments as issues to fix Why wrong: These are authenticity markers. Fixing them makes content more synthetic. Do instead: Classify using the wabi-sabi decision tree. Only flag items on the banned list.
Anti-Pattern 2: Expecting Perfect Scores
What it looks like: Treating 70/100 as a failing score, aiming for 95+ Why wrong: Over-polished content is an AI tell. A score of 70-90 with wabi-sabi markers is more authentic than 95+ with none. Do instead: Pass threshold is 60. Expect authentic articles in the 70-90 range.
Anti-Pattern 3: Self-Assessing Voice Quality
What it looks like: "This sounds like the target voice to me" without running the validator
Why wrong: LLM assessment is inconsistent and biased. Deterministic scripts are reproducible.
Do instead: Always run voice_validator.py. Trust the script over your judgment.
Anti-Pattern 4: Skipping Wabi-Sabi Analysis
What it looks like: Running validators only and reporting scores without imperfection analysis Why wrong: Misses the key insight -- whether imperfections are features or bugs. Two articles with 75/100 can be very different. Do instead: Complete Phase 3 unless explicitly in quick mode.
Anti-Pattern 5: Fixing Articles During Evaluation
What it looks like: "Let me also fix these issues I found" during evaluation Why wrong: Evaluation and editing are separate workflows. Mixing them loses objectivity. Do instead: Report findings only. If fixes needed, redirect to anti-ai-editor skill.
References
This skill uses these shared patterns:
- Anti-Rationalization - Prevents shortcut rationalizations
- Verification Checklist - Pre-completion checks
- Wabi-Sabi Authenticity - Imperfection classification principles
- Pipeline Architecture - Pipeline design patterns
Domain-Specific Anti-Rationalization
| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "I can tell it's authentic without the validator" | Subjective assessment is unreliable | Run voice_validator.py, trust the script |
| "The typos are errors, I should flag them" | Typos in natural positions are wabi-sabi markers | Classify with the decision tree first |
| "Score is close to 60, probably fine" | Probably is not proven | Report exact score, let threshold decide |
| "No need for wabi-sabi analysis, scores tell the story" | Scores miss the authenticity texture | Complete Phase 3 unless quick mode |
| "I'll fix the issues while evaluating" | Evaluation and editing are separate concerns | Report only, redirect to anti-ai-editor |
Reference Files
${CLAUDE_SKILL_DIR}/references/report-template.md: Full report format with verdict criteria and examples${CLAUDE_SKILL_DIR}/references/wabi-sabi-classification.md: Complete marker tables and classification decision tree
More from notque/claude-code-toolkit
generate-claudemd
Generate project-specific CLAUDE.md from repo analysis.
12fish-shell-config
Fish shell configuration and PATH management.
12pptx-generator
PPTX presentation generation with visual QA: slides, pitch decks.
12codebase-overview
Systematic codebase exploration and architecture mapping.
10image-to-video
FFmpeg-based video creation from image and audio.
9data-analysis
Decision-first data analysis with statistical rigor gates.
9