python-refactor
Python Refactor
Transform complex Python into clear, maintainable code while preserving correctness. Phased workflow with safety-by-design and continuous validation. For deep references (anti-patterns, OOP principles, cognitive complexity, regression prevention), see the references/ directory.
When to invoke
- Explicit "human", "readable", "maintainable", "clean", or "refactor" request
- Code review flags comprehension or maintainability issues
- Legacy code modernization
- Onboarding / educational contexts
- Complexity metrics exceed thresholds
- Red flags: file > 500 lines with scattered functions and global state, multiple
globalstatements, no clear module/class organization, configuration mixed with business logic
Do NOT invoke when
- Code is performance-critical and profiling shows perf optimization is needed first
- Code is scheduled for deletion or replacement
- External dependencies require upstream contributions instead
- User explicitly requested perf optimization over readability
Core principles (priority order)
- Prefer structured OOP for complex code -- shared state, multiple concerns, scattered globals = restructure into classes/modules. (But: simple modules with pure functions, click/argparse CLIs, and functional pipelines DON'T need to be forced into classes.)
- Clarity over cleverness -- explicit beats implicit
- Preserve correctness -- all tests pass, behavior identical
- Single Responsibility -- one thing per class/function (SOLID)
- Self-documenting structure -- code = what, comments = why
- Progressive disclosure -- reveal complexity in layers
- Reasonable performance -- never sacrifice >2× without explicit approval
Hard constraints
- SAFETY BY DESIGN -- mandatory migration checklists for destructive changes. CREATE → SEARCH → MIGRATE → VERIFY → only then REMOVE. NEVER remove before 100% migration verified.
- STATIC ANALYSIS FIRST --
flake8 --select=F821,E0602(orruff check --select=F821) BEFORE tests. Catches NameErrors immediately. - PRESERVE BEHAVIOR -- all existing tests pass after.
- NO PERF REGRESSION -- never degrade > 2× without explicit approval.
- NO API CHANGES -- public APIs unchanged unless explicitly requested + documented.
- NO OVER-ENGINEERING -- simple stays simple.
- NO MAGIC -- no framework magic, no metaprogramming unless absolutely necessary.
- VALIDATE CONTINUOUSLY -- static analysis + tests after each logical change.
Regression prevention (MANDATORY)
Refactoring must NEVER introduce regressions. Read references/REGRESSION_PREVENTION.md before any session.
Before each session:
- Test suite passes 100%
- Coverage ≥ 80% on target code (write tests FIRST if not)
- Golden outputs captured for critical edge cases
- Static analysis baseline saved
After EACH micro-change (not at the end -- every single one):
flake8 --select=F821,E999→ 0 errorspytest -x→ all passing- Spot check 1 edge case for unchanged behavior
If ANY check fails: STOP → REVERT → ANALYZE → FIX APPROACH → RETRY.
ANY REGRESSION = TOTAL FAILURE.
Workflow (4 phases)
Phase 1: Analysis
- Read the entire codebase section.
- Identify readability issues using
references/anti-patterns.md(script-like/global-state, God Objects, nested conditionals, long functions, magic numbers, cryptic names). - Assess architecture against
references/oop_principles.md(proper classes/modules, encapsulated state, separated responsibilities, SOLID, DI vs hard-coded deps). - Measure current metrics with
scripts/measure_complexity.pyorscripts/analyze_multi_metrics.py. - Run linting analysis (see Tooling below).
- Check test coverage; identify gaps to fill BEFORE refactoring.
- Document with
assets/templates/analysis_template.md.
Output: prioritized list of issues by impact and risk.
Phase 2: Planning
- Classify each change:
- Non-destructive (rename, docs, type hints) → low risk
- Destructive (remove globals, delete functions, replace APIs) → high risk
- For DESTRUCTIVE changes -- migration plan is MANDATORY:
- Search ALL usages of each element to be removed
- Document every usage (file, line, type)
- No complete migration plan = cannot proceed with the destructive change
- Risk assessment per change (Low/Medium/High)
- Dependency map -- what depends on this code?
- Test strategy -- what tests are needed? what might break?
- Order changes safest → riskiest
- Document expected metric improvements
Output: refactoring plan, sequenced changes, migration plans, test strategy, rollback plan.
Phase 3: Execution
Non-destructive (safe anytime)
- Rename for clarity
- Extract magic numbers/strings to named constants
- Add/improve docs and type hints
- Add guard clauses to reduce nesting
Destructive (STRICT PROTOCOL)
- CREATE new structure (no removal) -- write new classes/functions + tests
- SEARCH for ALL usages of the element being removed
- CREATE migration checklist documenting every found usage
- MIGRATE one usage at a time, checking off the list, running static analysis + tests after each
- VERIFY complete migration -- re-run searches, should find zero old references
- REMOVE old code only after 100% migration verified
Execution rules
- NEVER skip the migration checklist for destructive changes
- Run static analysis BEFORE tests
- One pattern at a time -- never mix multiple refactoring patterns in a single change
- Atomic commits -- each migration step gets its own commit
- Stop on ANY error (static analysis OR test failure) → immediate fix/revert
Recommended order
- Transform script-like code to proper architecture (
references/examples/script_to_oop_transformation.md) - Rename for clarity
- Extract magic numbers/strings to constants/enums
- Improve docs + type hints
- Extract methods to reduce function length
- Simplify conditionals with guard clauses
- Reduce nesting depth
- Final review: separation of concerns
Phase 4: Validation
- Static analysis FIRST:
flake8 <file> --select=F821,E0602 # undefined names/variables -- MUST be 0 flake8 <file> --select=F401 # unused imports flake8 <file> # full quality check - Full test suite → 100% pass required.
- Architecture validation: global state eliminated/encapsulated, proper modules/classes, separated responsibilities, SOLID compliance.
- Before/after metrics with
scripts/measure_complexity.pyorscripts/analyze_multi_metrics.py. - Performance regression check with
scripts/benchmark_changes.pyfor hot paths. - Summary report using
assets/templates/summary_template.md. - Flag for human review if: perf degraded > 10%, public API signatures changed, test coverage decreased, significant architectural changes.
Refactoring patterns (catalog summary)
Full catalog with examples in references/patterns.md. Key patterns:
- Guard Clauses -- early returns instead of nested conditionals
- Extract Method -- split large functions into focused units (resets the nesting counter -- most powerful for cognitive complexity)
- Dictionary Dispatch -- replace if-elif chains with lookup tables
- Match Statement (Py 3.10+) -- counts as +1 total, not per branch
- Named Boolean Conditions -- extract complex booleans into named variables
- Encapsulate Global State -- move globals into classes with proper encapsulation
- Group Related Functions -- organize scattered functions into classes by responsibility
- Create Domain Models -- replace primitive dicts with dataclasses + enums
- Apply Dependency Injection -- replace hard-coded deps with injected ones
For cognitive complexity calculation rules and reduction strategies, see references/cognitive_complexity_guide.md.
Naming conventions
- Variables: descriptive, booleans as
is_active/has_permission/can_edit, collections as plurals - Functions: verb + object (
calculate_total,validate_email); boolean queries asis_valid()/has_items() - Constants:
UPPERCASE_WITH_UNDERSCORES; replace magic numbers/strings - Classes: PascalCase nouns (
UserAccount,PaymentProcessor)
Anti-patterns to fix (priority order)
Full catalog: references/anti-patterns.md.
- Critical: script-like / procedural code with global state; God Object / God Class
- High: complex nested conditionals (> 3 levels), long functions (> 30 lines), magic numbers, cryptic names, missing type hints, missing docstrings
- Medium: duplicate code, primitive obsession, long parameter lists (> 5)
- Low: inconsistent naming, redundant comments, unused imports
Tooling
Primary stack: Ruff + Complexipy (recommended for new projects)
uv tool install ruff complexipy radon wily
ruff check src/ # fast linting (Rust, replaces flake8+plugins)
complexipy src/ --max-complexity-allowed 15 # cognitive complexity (Rust)
radon mi src/ -s # maintainability index
Full configuration (pyproject.toml, pre-commit, GitHub Actions): references/cognitive_complexity_guide.md.
Alternative: flake8 + curated plugins
For projects already on flake8, see references/flake8_plugins_guide.md (curated 16-plugin selector list).
Multi-metric analysis
scripts/analyze_multi_metrics.py combines complexipy + radon + maintainability index in a single report.
| Metric | Tool | Use |
|---|---|---|
| Cognitive complexity | complexipy | Human comprehension |
| Cyclomatic complexity | ruff (C901), radon | Test planning |
| Maintainability index | radon | Overall code health |
Metric targets
- Cyclomatic complexity: < 10 per function (warning 15, error 20)
- Cognitive complexity: < 15 per function (SonarQube default; warning 20)
- Function length: < 30 lines (warning 50)
- Nesting depth: ≤ 3 levels
- Docstring coverage: > 80% for public functions
- Type-hint coverage: > 90% for public APIs
Historical tracking with Wily
Trends matter, not just thresholds. Setup + CI integration: references/cognitive_complexity_guide.md.
Common refactoring mistakes
Full guide: references/REGRESSION_PREVENTION.md. Key traps:
- Incomplete migration -- removing old code before ALL usages migrated (causes NameErrors).
- Partial pattern application -- applying refactoring to some functions but not others.
- Breaking public APIs -- changing signatures used by external code.
- Assuming tests cover everything -- tests pass but runtime errors occur (run static analysis!).
When to reach for which tool
- clean-code (cross-language plugin) -- multi-language cosmetic cleanup; renames local vars, improves comments, simplifies structure. Lowest regression risk. Use for "make this readable", "clean up naming."
- python-refactor (this skill) -- Python-only deep restructuring. OOP transformation, SOLID, complexity metrics, migration checklists, benchmark validation. Use for "refactor this module", "reduce complexity", "transform to OOP."
Escalation path: clean-code → python-refactor (safest to most thorough).
Integration
- python-tdd -- set up tests before refactoring, validate coverage after
- python-performance-optimization -- deep profiling before/after
- python-packaging -- handle pyproject.toml + distribution if refactoring a library
- uv-package-manager --
uv run ruff,uv run complexipyfor tool execution - async-python-patterns -- reference async patterns when refactoring async code
When NOT to refactor
Perf-critical optimized code (profile first), code scheduled for deletion, external deps (contribute upstream), stable legacy code nobody needs to modify.
Limitations
Cannot improve algorithmic complexity (that's an algorithm change). Cannot add domain knowledge not in code/comments. Cannot guarantee correctness without tests. Style preferences vary -- adjust to team conventions.
Examples
references/examples/:
script_to_oop_transformation.md-- script → clean OOP architecture (flagship case study)python_complexity_reduction.md-- nested conditionals and long functionstypescript_naming_improvements.md-- naming patterns (cross-language reference)
Success criteria
- Zero regressions -- all tests pass, behavior unchanged
- Golden master match for documented critical cases
- Complexity metrics improved (documented in summary)
- No perf regression > 10% (or explicit approval)
- Documentation coverage improved
- Code easier for humans to understand
- No new security vulnerabilities
- Atomic, well-documented git history
- Wily trend -- complexity not increased vs previous commit
- Static analysis shows improvement
More from acaprino/alfio-claude-plugins
async-python-patterns
>
28stripe-agent
Comprehensive Stripe integration agent for payments, subscriptions, billing, and marketplace management. Use when Claude needs to work with Stripe API for creating customers, managing subscriptions, processing payments, handling checkout sessions, setting up products/prices, managing webhooks, Connect marketplaces, metered billing, tax calculation, fraud prevention, or any payment-related task. Triggers on mentions of Stripe, payments, subscriptions, billing, checkout, invoices, payment intents, recurring payments, Connect, marketplace, SCA, 3D Secure, or disputes.
25writing-plans
>
22