tidy-code
tidy-code Review
Review source code against 10 language-agnostic structural quality principles and produce a findings report with concrete refactoring suggestions.
Important: This skill produces a report. Do not modify any reviewed files.
Activation
This skill activates only when the user explicitly invokes it via the /tidy-code slash command. Do NOT auto-activate on natural-language requests such as "review my code," "audit the code," "clean this up," "find code smells," "make this more maintainable," or "reduce complexity" — those phrasings must not trigger this skill.
Review Workflow
- Select files — Use the user's specified files. If none specified, run
scripts/scan-source-files.sh <project-directory>to discover source files. The<project-directory>argument is required — it must be the root of the user's project, NOT the skill's own directory. - Load rules — Read
references/principles-quick-ref.mdfor the full checklist with detection signals and thresholds. - Review files in parallel — Spawn parallel sub-agents (via the Task tool) using a fast cheap model (e.g., Claude Haiku 4.5, Gemini Flash 2.5) at medium effort for file-review sub-agents. Batch files into groups of 3–5 per sub-agent, grouping related files (same module or directory) together when possible so sub-agents can detect cross-file violations within their batch. Each sub-agent receives: its file list, the principles from
references/principles-quick-ref.md, and instructions to produce findings in the Output Format below, loading detailed reference files on demand as violations are detected. Run up to 5 sub-agents concurrently. Once all complete, collect their findings. If a sub-agent fails, log the error and continue — do not block the rest of the review. - Collect and deduplicate findings — Gather findings from all sub-agents. Remove exact duplicates if file batches shared related files. Check for cross-file violations that individual sub-agents may have missed (e.g., a dependency injected in one file but hardcoded in another within a different batch). For large repos, increase batch size rather than exceeding 5 concurrent sub-agents — use batches of up to 10 files if more than 50 app files are found. Run in waves of 5 until all batches are dispatched.
- Classify severity — Use
references/severity-rubric.mdto assign high/medium/low. - Verify suggestions — For each suggested rewrite, confirm it resolves the flagged violation, does not introduce a new violation of any other principle, and preserves the original behavior. If a suggestion introduces a new violation, revise it before including it.
- Assemble report — Write findings to
.agents/tidy/code/tidy-code-findings-YYYYMMDD.md(create the directory if it doesn't exist; use today's date). Group findings by file, then by severity (high first). End with the summary block.
Gotchas
- The script outputs
--- test files ---as a literal line in stdout — strip this separator before passing the file list to sub-agents and note which files are test files for the lighter-touch rules. - If
scan-source-files.shreturns an empty app-files section, abort with a user-facing message rather than spawning sub-agents with empty batches. - Files in
/tests/that are not test files themselves (factories, fixtures, helpers) should be reviewed as application code, not under the test light-touch rules.
Model & Effort Guidance
This skill does not require frontier-class reasoning for typical codebases. The 10 principles have concrete detection signals and named refactorings that reduce the task to structured pattern matching.
- Orchestration / deduplication: use a mid-tier model (e.g., Claude Sonnet 4.5, Gemini Pro 2.5) at high effort.
- File-review sub-agents (structured pattern matching against 10 named signals): use a fast cheap model (e.g., Claude Haiku 4.5, Gemini Flash 2.5) at medium effort.
- Optional escalation for very large or architecturally complex codebases: upgrade the orchestrator to a frontier reasoning model (e.g., Claude Opus 4, Gemini 2.5 Pro).
Recommended optimization — two-pass sub-agent architecture: For large codebases or when token efficiency matters, consider splitting file review into two cheap-model passes: (1) a detection pass where sub-agents identify candidate violations by matching the 10 detection signals and output a structured list of suspects, then (2) a refactor-suggestion pass where a mid-tier model generates concrete rewrites only for confirmed violations. This reduces expensive generation to a smaller set of confirmed findings. This is a recommended optimization, not a required change to the workflow above.
Output Format
Use this exact structure for each finding:
## [file path]
### Finding [N] — [Smell name] [ID] (severity: [high|medium|low])
- **Line [N]:** `[original code snippet]`
- **Principle:** [One-sentence explanation of the violated principle]
- **Refactoring:** [Named refactoring technique]
- **Suggested:**
[concrete rewrite as a fenced code block]
Example:
## src/services/order_service.py
### Finding 1 — Hidden Dependency TC-02 (severity: high)
- **Line 8:** `self.db = PostgresConnection("prod:5432")`
- **Principle:** Dependencies created internally are invisible, untestable, and tightly coupled to a specific implementation.
- **Refactoring:** Inject via constructor parameter
- **Suggested:**
```python
class OrderService:
def __init__(self, db, mailer):
self.db = db
self.mailer = mailer
```
### Finding 2 — Nested Pyramid TC-03 (severity: medium)
- **Line 34:** 3 levels of nesting in `process_order()`
- **Principle:** Each nesting level forces the reader to maintain a mental stack. Guard clauses flatten the logic.
- **Refactoring:** Replace Nested Conditional with Guard Clauses
- **Suggested:**
```python
def process_order(order):
if not order:
return None
if not order.items:
return None
if not order.payment:
raise ValueError("Missing payment")
# happy path — no nesting
```
If a file has no findings, omit it from the report entirely.
End the report with:
## Summary
- **Files reviewed:** [N]
- **Total findings:** [N] ([N] high, [N] medium, [N] low)
- **Top issues:** [List the 2-3 most frequent violations]
- **Highest-leverage fix:** [The single change that would most improve the codebase]
When to Load Reference Files
Load references on demand to conserve context:
| File | When to load |
|---|---|
references/principles-quick-ref.md |
Always — load at start of every review |
references/severity-rubric.md |
When classifying findings |
references/composition-over-inheritance.md |
TC-01 candidate detected |
references/dependency-injection.md |
TC-02 candidate detected |
references/guard-clauses.md |
TC-03 candidate detected |
references/single-responsibility.md |
TC-04 candidate detected |
references/fail-fast.md |
TC-05 candidate detected |
references/least-surprise.md |
TC-06 candidate detected |
references/tell-dont-ask.md |
TC-07 candidate detected |
references/immutability.md |
TC-08 candidate detected |
references/naming.md |
TC-09 candidate detected |
references/functional-core-imperative-shell.md |
TC-10 candidate detected |
Scope Rules
- Review: application source code — functions, classes, modules, components
- Skip: test fixtures/factories, generated code, migration files, configuration files (JSON/YAML/TOML), vendor/third-party code, single-use scripts under 20 lines, type declaration files (.d.ts)
- Light touch: test files — apply naming (TC-09) and guard clauses (TC-03) but do not enforce DI (TC-02) or functional core (TC-10), since test setup is inherently side-effectful
- Do not modify reviewed files — produce recommendations only
Comment-prose quality is out of scope. If a user wants prose review of source comments, run plain-language on the file directly.
Stale TODO/FIXME/HACK markers older than 12 months are out of scope here — tidy-project (TP-10 STALE MARKER) owns them because the age signal needs git history.