tidy-code Review

Review source code against 10 language-agnostic structural quality principles and produce a findings report with concrete refactoring suggestions.

Important: This skill produces a report. Do not modify any reviewed files.

Activation

This skill activates only when the user explicitly invokes it via the /tidy-code slash command. Do NOT auto-activate on natural-language requests such as "review my code," "audit the code," "clean this up," "find code smells," "make this more maintainable," or "reduce complexity" — those phrasings must not trigger this skill.

Review Workflow

Select files — Use the user's specified files. If none specified, run scripts/scan-source-files.sh <project-directory> to discover source files. The <project-directory> argument is required — it must be the root of the user's project, NOT the skill's own directory.
Load rules — Read references/principles-quick-ref.md for the full checklist with detection signals and thresholds.
Review files in parallel — Spawn parallel sub-agents (via the Task tool) using a fast cheap model (e.g., Claude Haiku 4.5, Gemini Flash 2.5) at medium effort for file-review sub-agents. Batch files into groups of 3–5 per sub-agent, grouping related files (same module or directory) together when possible so sub-agents can detect cross-file violations within their batch. Each sub-agent receives: its file list, the principles from references/principles-quick-ref.md, and instructions to produce findings in the Output Format below, loading detailed reference files on demand as violations are detected. Run up to 5 sub-agents concurrently. Once all complete, collect their findings. If a sub-agent fails, log the error and continue — do not block the rest of the review.
Collect and deduplicate findings — Gather findings from all sub-agents. Remove exact duplicates if file batches shared related files. Check for cross-file violations that individual sub-agents may have missed (e.g., a dependency injected in one file but hardcoded in another within a different batch). For large repos, increase batch size rather than exceeding 5 concurrent sub-agents — use batches of up to 10 files if more than 50 app files are found. Run in waves of 5 until all batches are dispatched.
Classify severity — Use references/severity-rubric.md to assign high/medium/low.
Verify suggestions — For each suggested rewrite, confirm it resolves the flagged violation, does not introduce a new violation of any other principle, and preserves the original behavior. If a suggestion introduces a new violation, revise it before including it.
Assemble report — Write findings to .agents/tidy/code/tidy-code-findings-YYYYMMDD.md (create the directory if it doesn't exist; use today's date). Group findings by file, then by severity (high first). End with the summary block.

Gotchas

The script outputs --- test files --- as a literal line in stdout — strip this separator before passing the file list to sub-agents and note which files are test files for the lighter-touch rules.
If scan-source-files.sh returns an empty app-files section, abort with a user-facing message rather than spawning sub-agents with empty batches.
Files in /tests/ that are not test files themselves (factories, fixtures, helpers) should be reviewed as application code, not under the test light-touch rules.

Model & Effort Guidance

This skill does not require frontier-class reasoning for typical codebases. The 10 principles have concrete detection signals and named refactorings that reduce the task to structured pattern matching.

Orchestration / deduplication: use a mid-tier model (e.g., Claude Sonnet 4.5, Gemini Pro 2.5) at high effort.
File-review sub-agents (structured pattern matching against 10 named signals): use a fast cheap model (e.g., Claude Haiku 4.5, Gemini Flash 2.5) at medium effort.
Optional escalation for very large or architecturally complex codebases: upgrade the orchestrator to a frontier reasoning model (e.g., Claude Opus 4, Gemini 2.5 Pro).

Recommended optimization — two-pass sub-agent architecture: For large codebases or when token efficiency matters, consider splitting file review into two cheap-model passes: (1) a detection pass where sub-agents identify candidate violations by matching the 10 detection signals and output a structured list of suspects, then (2) a refactor-suggestion pass where a mid-tier model generates concrete rewrites only for confirmed violations. This reduces expensive generation to a smaller set of confirmed findings. This is a recommended optimization, not a required change to the workflow above.

Output Format

Use this exact structure for each finding:

## [file path]

### Finding [N] — [Smell name] [ID] (severity: [high|medium|low])
- **Line [N]:** `[original code snippet]`
- **Principle:** [One-sentence explanation of the violated principle]
- **Refactoring:** [Named refactoring technique]
- **Suggested:**
  [concrete rewrite as a fenced code block]

Example:

## src/services/order_service.py

### Finding 1 — Hidden Dependency TC-02 (severity: high)
- **Line 8:** `self.db = PostgresConnection("prod:5432")`
- **Principle:** Dependencies created internally are invisible, untestable, and tightly coupled to a specific implementation.
- **Refactoring:** Inject via constructor parameter
- **Suggested:**
  ```python
  class OrderService:
      def __init__(self, db, mailer):
          self.db = db
          self.mailer = mailer
  ```

### Finding 2 — Nested Pyramid TC-03 (severity: medium)
- **Line 34:** 3 levels of nesting in `process_order()`
- **Principle:** Each nesting level forces the reader to maintain a mental stack. Guard clauses flatten the logic.
- **Refactoring:** Replace Nested Conditional with Guard Clauses
- **Suggested:**
  ```python
  def process_order(order):
      if not order:
          return None
      if not order.items:
          return None
      if not order.payment:
          raise ValueError("Missing payment")
      # happy path — no nesting
  ```

If a file has no findings, omit it from the report entirely.

End the report with:

## Summary
- **Files reviewed:** [N]
- **Total findings:** [N] ([N] high, [N] medium, [N] low)
- **Top issues:** [List the 2-3 most frequent violations]
- **Highest-leverage fix:** [The single change that would most improve the codebase]

When to Load Reference Files

Load references on demand to conserve context:

File	When to load
`references/principles-quick-ref.md`	Always — load at start of every review
`references/severity-rubric.md`	When classifying findings
`references/composition-over-inheritance.md`	TC-01 candidate detected
`references/dependency-injection.md`	TC-02 candidate detected
`references/guard-clauses.md`	TC-03 candidate detected
`references/single-responsibility.md`	TC-04 candidate detected
`references/fail-fast.md`	TC-05 candidate detected
`references/least-surprise.md`	TC-06 candidate detected
`references/tell-dont-ask.md`	TC-07 candidate detected
`references/immutability.md`	TC-08 candidate detected
`references/naming.md`	TC-09 candidate detected
`references/functional-core-imperative-shell.md`	TC-10 candidate detected

Scope Rules

Review: application source code — functions, classes, modules, components
Skip: test fixtures/factories, generated code, migration files, configuration files (JSON/YAML/TOML), vendor/third-party code, single-use scripts under 20 lines, type declaration files (.d.ts)
Light touch: test files — apply naming (TC-09) and guard clauses (TC-03) but do not enforce DI (TC-02) or functional core (TC-10), since test setup is inherently side-effectful
Do not modify reviewed files — produce recommendations only

Comment-prose quality is out of scope. If a user wants prose review of source comments, run plain-language on the file directly.

Stale TODO/FIXME/HACK markers older than 12 months are out of scope here — tidy-project (TP-10 STALE MARKER) owns them because the age signal needs git history.

tidy-code

tidy-code Review

Activation

Review Workflow

Gotchas

Model & Effort Guidance

Output Format

When to Load Reference Files

Scope Rules

More from ggwicz/skills

plain-language

code-hygiene

tidy-project