code-qualities-assessment

Installation
SKILL.md

Code Qualities Assessment

Evaluate code maintainability using 5 timeless design qualities with quantifiable scoring rubrics.

Triggers

  • assess code quality
  • evaluate maintainability
  • check code qualities
  • testability review
  • run quality assessment

Quick Start

# Assess a single file
python3 scripts/assess.py --target src/services/auth.py

# Assess changed files only (CI mode)
python3 scripts/assess.py --target . --changed-only --format json

# Full module assessment with HTML report
python3 scripts/assess.py --target src/services/ --format html --output quality-report.html

The 5 Code Qualities

Quality Question Score 10 Score 1-3
Cohesion How related are responsibilities? Single, well-defined responsibility Unrelated responsibilities jammed together
Coupling How dependent on other code? Minimal deps, depends on abstractions Tightly coupled, hard-coded dependencies
Encapsulation How well are internals hidden? All internals private, minimal API Everything public, no information hiding
Testability How easily verified in isolation? Pure functions, injected dependencies Hard to test, requires full integration
Non-Redundancy How unique is each piece of knowledge? Zero duplication, appropriate abstractions Pervasive copy-paste

When to Use

Use this skill when:

  • Reviewing code quality before merge
  • Identifying refactoring priorities
  • Establishing quality baselines
  • Teaching code design principles
  • Tracking quality trends over time
  • Enforcing quality gates in CI

Use analyze instead when:

  • Performing broad codebase investigation
  • Security assessment is the focus
  • Architecture review is needed

Process

The skill runs automated assessment via scripts/assess.py:

  1. Symbol Extraction

    • Detect language
    • Use Serena (if available)
    • Extract classes/methods
  2. Quality Scoring

    • Run 5 quality assessments
    • Apply context rules (test vs prod)
    • Aggregate symbol -> file -> module
  3. Comparison (if historical data)

    • Load previous scores
    • Identify regressions/improvements
  4. Report Generation

    • Format: markdown, JSON, or HTML
    • Include remediation guidance
    • Link to refactoring patterns
  5. Gate Enforcement (CI mode)

    • Check thresholds
    • Exit code: 0=pass, 10=degraded

Command Reference

Basic Usage

python3 scripts/assess.py --target <path> [options]

Parameters

Parameter Required Default Description
--target Yes - File, directory, or glob pattern
--context No production production, test, or generated
--changed-only No false Only assess changed files (git diff)
--format No markdown markdown, json, or html
--config No .qualityrc.json Path to config file
--output No stdout Output file path
--use-serena No auto auto, yes, or no (Serena integration)

Exit Codes

Code Meaning
0 Assessment complete, all thresholds met
10 Quality degraded vs previous run
11 Quality below configured thresholds
1 Script error (invalid args, file not found)

Configuration

Create .qualityrc.json to customize thresholds:

{
  "thresholds": {
    "cohesion": { "min": 7, "warn": 5 },
    "coupling": { "max": 3, "warn": 5 },
    "encapsulation": { "min": 7, "warn": 5 },
    "testability": { "min": 6, "warn": 4 },
    "nonRedundancy": { "min": 8, "warn": 6 }
  },
  "context": {
    "test": {
      "testability": { "min": 3 }
    }
  },
  "ignore": [
    "**/generated/**",
    "**/*.pb.py",
    "**/migrations/**"
  ]
}

Anti-Patterns

Avoid Why Instead
Running on entire codebase every commit Slow, noisy Use --changed-only in CI
Using scores for performance reviews Gaming the system Focus on trend improvement
Blocking merges on absolute scores Discourages refactoring old code Block on regression only
Ignoring context (test vs production) False positives Use --context flag
Not configuring thresholds One-size-fits-all does not fit Customize .qualityrc.json

Verification

After running assessment:

  • All 5 qualities scored for each symbol
  • Scores are 1-10 (not null or out of range)
  • Remediation links provided for low scores
  • Report format is valid (markdown/JSON/HTML)
  • Exit code matches assessment result
  • Historical data saved to .quality-cache/

Cohesion

How strongly related are responsibilities within a boundary?

High cohesion = focused, understandable code. Low cohesion = "god objects" doing too much.

Score Description
10 Single, well-defined responsibility
7-9 Primary responsibility clear, minor supporting concerns
4-6 Multiple loosely related responsibilities
1-3 Unrelated responsibilities jammed together

Coupling

How dependent is this code on other code?

Loose coupling = independent evolution, easy testing. Tight coupling = fragile, hard to test.

Score Description
10 Minimal dependencies, depends on abstractions
7-9 Few dependencies, all explicit
4-6 Moderate dependencies, some global state
1-3 Tightly coupled, hard-coded dependencies

Encapsulation

How well are implementation details hidden?

Good encapsulation = freedom to change internals. Poor encapsulation = brittle API.

Score Description
10 All internals private, minimal public API
7-9 Mostly private, well-defined API
4-6 Some internals exposed
1-3 Everything public, no information hiding

Testability

How easily can behavior be verified in isolation?

Testable code = fast feedback, confidence to refactor. Untestable code = fear of change.

Score Description
10 Pure functions, injected dependencies
7-9 Mostly testable, straightforward to mock
4-6 Moderately testable, requires setup
1-3 Hard to test, requires full integration

Non-Redundancy

How unique is each piece of knowledge?

DRY code = fix once, single source of truth. Duplication = fix N times, maintenance burden.

Score Description
10 Zero duplication, appropriate abstractions
7-9 Minimal duplication (intentional)
4-6 Moderate duplication, missed abstractions
1-3 Pervasive copy-paste

Example 1: Single File Assessment

python3 scripts/assess.py --target src/models/user.py

Output:

# Code Quality Assessment: src/models/user.py

## Summary
- **Cohesion**: 8/10
- **Coupling**: 4/10 (warning)
- **Encapsulation**: 9/10
- **Testability**: 7/10
- **Non-Redundancy**: 9/10

## Issues Found

### Coupling: 4/10 (Warning)
**Problem**: Direct instantiation of DatabaseConnection in constructor

**Impact**: Hard to test, tightly coupled to database layer

**Remediation**: Use dependency injection
- See: [Dependency Injection](references/patterns/dependency-injection.md)
- Related ADR: ADR-023 (Dependency Management)

Example Fix:

# Before
class User:
    def __init__(self):
        self.db = DatabaseConnection()  # Hard-coded dependency

# After
class User:
    def __init__(self, db: DatabaseInterface):
        self.db = db  # Injected dependency

Example 2: CI Integration

# In CI pipeline
python3 scripts/assess.py --target . --changed-only --format json --output quality.json

# Exit code 10 = quality degraded, fail PR
# Exit code 0 = quality maintained, pass

Example 3: Full Codebase Report

python3 scripts/assess.py --target src/ --format html --output reports/quality.html

Opens dashboard showing:

  • Quality trends over time
  • Hot spots (lowest scoring files)
  • Improvement opportunities
  • Top refactoring priorities

With planner

# Identify refactoring targets
python3 scripts/assess.py --target src/ --format json | \
  jq '.files | sort_by(.overall) | .[0:5]' > low-quality-files.json

# Feed to planner
planner --input low-quality-files.json --goal "Refactor lowest quality files"

With adr-review

When reviewing ADRs, include quality impact:

# Before implementing ADR
python3 scripts/assess.py --target affected-files.txt > baseline.md

# After implementing ADR
python3 scripts/assess.py --target affected-files.txt > post-implementation.md

# Compare
diff baseline.md post-implementation.md

With analyze

Combine broad analysis with focused quality metrics:

# First: broad exploration
analyze --target src/

# Then: quality deep dive on problem areas
python3 scripts/assess.py --target src/services/auth.py

For detailed scoring methodology and examples:


References

File Content
dotnet-performance-patterns.md Allocation-free .NET patterns with quality scoring calibration

Language Support

Support Level Languages
Full Python (.py), TypeScript/JavaScript (.ts, .js, .tsx, .jsx), C# (.cs), Java (.java), Go (.go)
Partial (heuristic) Ruby (.rb), Rust (.rs), PHP (.php), Kotlin (.kt)

Serena integration improves accuracy when available.


Design Philosophy

This skill embodies "sergeant methods directing privates":

  • Sergeant (assess.py): Orchestrates workflow, delegates to specialists
  • Privates (score_*.py): Focus on one quality each, report back

Each quality scorer is cohesive (single responsibility), loosely coupled (independent), and testable (pure calculation).

Timelessness: 9/10

These 5 qualities are computer science fundamentals:

  • Cohesion and coupling: 1970s (Parnas, Stevens)
  • Encapsulation: Core OOP principle (1960s)
  • Testability: TDD movement (1990s-2000s)
  • DRY: Pragmatic Programmer (1999)

Language-agnostic design ensures longevity across technology shifts.

Related skills

More from rjmurillo/ai-agents

Installs
1
GitHub Stars
24
First Seen
12 days ago