skill-auditor
Skill Auditor
Portfolio-level skill routing analysis and optimization. Analyzes real session transcripts to find routing errors, attention competition, and coverage gaps, then generates an interactive HTML report.
Prerequisites
pip install tiktoken(optional — falls back to character-based estimation)- No external API keys required. Analysis uses Claude sub-agents.
Workflow
Run all steps sequentially. The coordinator (you) manages data flow between scripts and sub-agents.
Step 0: Initial Questions
Before starting, ask the user two questions using AskUserQuestion:
- Report language: "レポートの言語は? (e.g. 日本語, English, 中文, ...)" — Free text input. Default to the user's conversation language if not specified.
- Scope: "分析範囲はどうしますか?" — Cross-project (all projects) / Current project only
Store these choices. Pass the language choice to all sub-agents as an instruction prefix: "Write all output text (health_assessment, detail, reason, suggested_fix, etc.) in [chosen language]."
For cross-project mode, use "all" as the project_path argument in Step 3.
For current-project mode, use --cwd "$(pwd)".
Step 1: Detect Project
If cross-project mode was selected:
python3 scripts/collect_transcripts.py all --days 14 \
--output <workspace>/transcripts.json --verbose
If current-project mode:
python3 scripts/collect_transcripts.py --cwd "$(pwd)" --days 14 \
--output <workspace>/transcripts.json --verbose
If auto-detection fails, show the list and ask the user which project to audit.
For cross-project mode, base dir: ~/.claude/skill-report/.
For current-project mode, base dir: <project>/.claude/skill-report/.
Step 2: Set Up Workspace
Each run gets a timestamped subdirectory so multiple runs never collide:
RUN_ID=$(date +%Y-%m-%dT%H-%M-%S)
WORKSPACE=<base_dir>/${RUN_ID}
mkdir -p ${WORKSPACE}
Use ${WORKSPACE} as <workspace> in all subsequent steps.
health-history.json stays at <base_dir>/health-history.json (shared
across runs — see Step 8).
Step 3: Collect Data
Run both scripts. They produce the input files for analysis.
# Transcripts already collected in Step 1
python3 scripts/collect_skills.py \
--output <workspace>/skill-manifest.json --verbose
Report the collection summary to the user: "N sessions, M user turns, K skills found. Attention budget: T tokens total."
Step 4: Routing Audit (Sub-agents)
Spawn one or more routing-analyst sub-agents. Each sub-agent:
- Reads
agents/routing-analyst.mdfor its analysis rubric - Reads a filtered skill manifest (only skills visible to that batch)
- Reads a batch of transcripts
- Writes analysis to a batch JSON file
IMPORTANT — Project-aware batching: Projects with local skills must be
batched separately. Projects with only global skills can be pooled together
(they see the same skill set). When many projects have unique local skills,
batches are capped at MAX_BATCHES (default 12). Excess groups are merged
by greedy similarity — the group with the fewest extra skills is merged into
the most similar existing batch. This adds a few extra skills to
visible_skill_names but keeps sub-agent count bounded.
import json, math
from collections import defaultdict
data = json.load(open("<workspace>/transcripts.json"))
manifest = json.load(open("<workspace>/skill-manifest.json"))
sessions = data["sessions"]
# Identify global skills and project-local skills
global_skills = [s for s in manifest["skills"] if s["scope"] == "global"]
global_names = [s["name"] for s in global_skills]
project_local = defaultdict(list) # project_path -> [skill dicts]
for s in manifest["skills"]:
if s["scope"] == "project-local" and s.get("project_path"):
project_local[s["project_path"]].append(s)
# Helper: does this encoded project_dir match a project_path with locals?
def find_local_skills(project_dir):
for pp, skills in project_local.items():
encoded = pp.replace("/", "-").replace(".", "-")
if encoded.lstrip("-") in project_dir.lstrip("-"):
return skills
return []
# Separate sessions: projects with local skills vs global-only
global_only_indices = [] # can be pooled
local_project_groups = defaultdict(list) # project_dir -> indices
for i, s in enumerate(sessions):
pdir = s.get("project_dir", "unknown")
locals = find_local_skills(pdir)
if locals:
local_project_groups[pdir].append(i)
else:
global_only_indices.append(i)
# Build batches
batch_size = 60
MAX_BATCHES = 12 # Cap total sub-agents to keep cost/time bounded
batches = []
# 1) Pool all global-only sessions together
for chunk_start in range(0, len(global_only_indices), batch_size):
chunk = global_only_indices[chunk_start:chunk_start + batch_size]
batches.append({
"session_indices": chunk,
"label": "global-only (mixed projects)",
"visible_skill_names": global_names,
})
# 2) Group projects with same local skill set, then batch together
by_skill_set = defaultdict(list) # tuple of local names -> indices
for pdir, indices in local_project_groups.items():
local_names = tuple(sorted(s["name"] for s in find_local_skills(pdir)))
by_skill_set[local_names].extend(indices)
local_batches = []
for local_names, indices in by_skill_set.items():
visible = global_names + list(local_names)
for chunk_start in range(0, len(indices), batch_size):
chunk = indices[chunk_start:chunk_start + batch_size]
local_batches.append({
"session_indices": chunk,
"label": f"local skills: {', '.join(local_names[:3])}{'...' if len(local_names) > 3 else ''}",
"visible_skill_names": visible,
"_local_set": set(local_names),
})
# 3) Merge if too many batches — greedily merge smallest into most similar
remaining_budget = MAX_BATCHES - len(batches)
while len(local_batches) > remaining_budget and len(local_batches) > 1:
# Find the smallest batch
smallest_idx = min(range(len(local_batches)), key=lambda i: len(local_batches[i]["session_indices"]))
smallest = local_batches.pop(smallest_idx)
# Find the most similar batch (fewest extra skills added)
best_idx, best_extra = 0, float("inf")
for j, b in enumerate(local_batches):
extra = len(smallest["_local_set"] - b["_local_set"]) + len(b["_local_set"] - smallest["_local_set"])
if extra < best_extra:
best_idx, best_extra = j, extra
# Merge into best match
target = local_batches[best_idx]
target["session_indices"].extend(smallest["session_indices"])
target["_local_set"] = target["_local_set"] | smallest["_local_set"]
merged_local = sorted(target["_local_set"])
target["visible_skill_names"] = global_names + merged_local
target["label"] = f"merged local skills: {', '.join(merged_local[:3])}{'...' if len(merged_local) > 3 else ''}"
# Clean up internal field and add to batches
for b in local_batches:
b.pop("_local_set", None)
batches.append(b)
for i, b in enumerate(batches):
print(f"Batch {i}: {len(b['session_indices'])} sessions, "
f"{len(b['visible_skill_names'])} skills — {b['label']}")
Before spawning, build a DMI list per batch from the manifest:
dmi_skills = {s["name"] for s in manifest["skills"] if s.get("disable_model_invocation")}
for b in batches:
b["dmi_skill_names"] = sorted(set(b["visible_skill_names"]) & dmi_skills)
Spawn sub-agents in parallel — one per batch:
For each batch i:
Agent tool (general-purpose):
"Read agents/routing-analyst.md from the skill-auditor skill directory for
your analysis instructions.
Read <workspace>/skill-manifest.json for skill definitions.
Read <workspace>/transcripts.json for session data.
Only analyze sessions with these indices: [list from batch].
Only evaluate against these skills: [visible_skill_names from batch].
Ignore skills not in this list — they are not available in this
project context.
These skills have disable-model-invocation: true and NEVER auto-fire:
[dmi_skill_names from batch]. Do NOT flag them as false_negative.
Write your analysis as JSON to <workspace>/batch-audit-<i>.json
following the exact schema in schemas/schemas.md (audit-report.json section)."
After all sub-agents complete, merge batch results:
- Union all
skill_reports(combine incidents, recalculate stats per skill) - Union all
competition_pairsandcoverage_gaps - Recalculate
metatotals (sum sessions_analyzed, turns_analyzed, etc.)
Write merged result to <workspace>/audit-report.json.
Step 5: Portfolio Analysis (Sub-agent)
Spawn a portfolio-analyst sub-agent:
Agent tool (general-purpose):
"Read agents/portfolio-analyst.md from the skill-auditor skill directory.
Read <workspace>/skill-manifest.json for skill definitions and attention budget.
Read <workspace>/audit-report.json for the routing audit results.
Write your portfolio analysis as JSON to <workspace>/portfolio-analysis.json."
Step 6: Improvement Plan (Sub-agent)
Spawn an improvement-planner sub-agent:
Agent tool (general-purpose):
"Read agents/improvement-planner.md from the skill-auditor skill directory.
Read <workspace>/audit-report.json for routing audit results.
Read <workspace>/portfolio-analysis.json for portfolio analysis.
Read <workspace>/skill-manifest.json for current skill definitions.
IMPORTANT: Write ALL output text in [chosen language] — this includes
fixes_issues, changes_made, cascade_risk, expected_impact, rationale,
suggested_description, and every other human-readable string field.
Write your improvement proposals as JSON to <workspace>/improvement-proposals.json.
Also write individual patch files to <workspace>/patches/ directory."
Step 7: Generate HTML Report
python3 scripts/generate_report.py \
--workspace <workspace>
Output: <workspace>/skill-audit-report.html.
Open the report in the browser:
open <workspace>/skill-audit-report.html
Step 8: Update Health History
Read <base_dir>/health-history.json (create if doesn't exist — start with
empty array []). Append a new entry with the current run's summary:
{
"timestamp": "<ISO 8601>",
"sessions_analyzed": <N>,
"turns_analyzed": <N>,
"portfolio_health": "<score>",
"routing_accuracy_avg": <0.0-1.0>,
"total_description_tokens": <N>,
"competition_conflicts": <N>,
"coverage_gaps": <N>,
"skills_audited": <N>,
"patches_proposed": <N>
}
If there's a previous entry, show the delta: "Accuracy changed from X to Y."
Step 9: Apply Patches (User Approval)
Show the user a summary from the HTML report. For each patch, show the before/after diff and cascade risk. Let the user approve or reject each.
For approved patches:
python3 scripts/apply_patches.py \
--patches <workspace>/patches/ --confirm \
--output <workspace>/changelog.md
Step 10: Summary
Report what was done:
- How many sessions analyzed
- How many routing issues found
- Portfolio health score
- Patches proposed / approved / applied
- New skills suggested
- Link to the HTML report
Analysis Capabilities
Routing Accuracy
Per-skill fire count, accuracy, false positives/negatives, specific incidents
with root cause analysis. See agents/routing-analyst.md for the rubric.
Attention Budget
Total description tokens across all skills. Per-skill token cost and efficiency
rating. Identifies bloated descriptions that waste attention budget.
See agents/portfolio-analyst.md.
Competition Matrix
Classifies skill-pair relationships: orthogonal / adjacent / overlapping / nested. Based on real transcript evidence, not just keyword overlap.
Portfolio-Aware Optimization
Patches consider the full skill set. Cascade checking is mandatory — each patch
states what it fixes, what it might break, and the token budget impact.
See agents/improvement-planner.md.
Error Taxonomy
| Verdict | Description |
|---|---|
| correct | Right skill loaded for the intent |
| false_negative | Skill should have loaded but didn't. High bar: task must be meaningfully worse without it |
| false_positive | Skill loaded but was irrelevant |
| confused | Wrong skill loaded instead of the correct one |
| no_skill_needed | No skill was needed for this turn (most common) |
| explicit_invocation | User explicitly called /skill-name — not a routing event, skip from accuracy calc |
| coverage_gap | User intent not covered by any existing skill |
Note on disable-model-invocation: true: Skills with this flag never
auto-fire by design. They are excluded from false_negative analysis and
listed separately in the report as "explicit-only" skills.
Workspace Structure
<base_dir>/ # e.g. ~/.claude/skill-report/
├── health-history.json # shared across runs (append-only)
├── 2026-03-04T18-45-23/ # run 1
│ ├── transcripts.json
│ ├── skill-manifest.json
│ ├── batch-audit-*.json
│ ├── audit-report.json
│ ├── portfolio-analysis.json
│ ├── improvement-proposals.json
│ ├── patches/*.patch.json
│ ├── skill-audit-report.html
│ └── changelog.md
└── 2026-03-04T20-12-07/ # run 2
└── ...
Troubleshooting
- "No project found": Run with
--cwdpointing to the project root, or use--listto see available projects. - tiktoken not installed: Token counts will use character-based approximation.
Install with
pip install tiktokenfor accuracy. - Large project (100+ sessions): Sessions are batched automatically. Multiple sub-agents run in parallel.
- Sub-agent produces invalid JSON: Re-run the specific sub-agent step. The rubric in agents/ includes exact schema specifications.