code-qualities-assessment

Installation

SKILL.md

Code Qualities Assessment

Evaluate code maintainability using 5 timeless design qualities with quantifiable scoring rubrics.

Triggers

assess code quality
evaluate maintainability
check code qualities
testability review
run quality assessment

Quick Start

# Assess a single file
python3 scripts/assess.py --target src/services/auth.py

# Assess changed files only (CI mode)
python3 scripts/assess.py --target . --changed-only --format json

# Full module assessment with HTML report
python3 scripts/assess.py --target src/services/ --format html --output quality-report.html

The 5 Code Qualities

Quality	Question	Score 10	Score 1-3
Cohesion	How related are responsibilities?	Single, well-defined responsibility	Unrelated responsibilities jammed together
Coupling	How dependent on other code?	Minimal deps, depends on abstractions	Tightly coupled, hard-coded dependencies
Encapsulation	How well are internals hidden?	All internals private, minimal API	Everything public, no information hiding
Testability	How easily verified in isolation?	Pure functions, injected dependencies	Hard to test, requires full integration
Non-Redundancy	How unique is each piece of knowledge?	Zero duplication, appropriate abstractions	Pervasive copy-paste

When to Use

Use this skill when:

Reviewing code quality before merge
Identifying refactoring priorities
Establishing quality baselines
Teaching code design principles
Tracking quality trends over time
Enforcing quality gates in CI

Use analyze instead when:

Performing broad codebase investigation
Security assessment is the focus
Architecture review is needed

Process

The skill runs automated assessment via scripts/assess.py:

Symbol Extraction
- Detect language
- Use Serena (if available)
- Extract classes/methods
Quality Scoring
- Run 5 quality assessments
- Apply context rules (test vs prod)
- Aggregate symbol -> file -> module
Comparison (if historical data)
- Load previous scores
- Identify regressions/improvements
Report Generation
- Format: markdown, JSON, or HTML
- Include remediation guidance
- Link to refactoring patterns
Gate Enforcement (CI mode)
- Check thresholds
- Exit code: 0=pass, 10=degraded

Command Reference

Basic Usage

python3 scripts/assess.py --target <path> [options]

Parameters

Parameter	Required	Default	Description
`--target`	Yes	-	File, directory, or glob pattern
`--context`	No	production	production, test, or generated
`--changed-only`	No	false	Only assess changed files (git diff)
`--format`	No	markdown	markdown, json, or html
`--config`	No	.qualityrc.json	Path to config file
`--output`	No	stdout	Output file path
`--use-serena`	No	auto	auto, yes, or no (Serena integration)

Exit Codes

Code	Meaning
0	Assessment complete, all thresholds met
10	Quality degraded vs previous run
11	Quality below configured thresholds
1	Script error (invalid args, file not found)

Configuration

Create .qualityrc.json to customize thresholds:

{
  "thresholds": {
    "cohesion": { "min": 7, "warn": 5 },
    "coupling": { "max": 3, "warn": 5 },
    "encapsulation": { "min": 7, "warn": 5 },
    "testability": { "min": 6, "warn": 4 },
    "nonRedundancy": { "min": 8, "warn": 6 }
  },
  "context": {
    "test": {
      "testability": { "min": 3 }
    }
  },
  "ignore": [
    "**/generated/**",
    "**/*.pb.py",
    "**/migrations/**"
  ]
}

Anti-Patterns

Avoid	Why	Instead
Running on entire codebase every commit	Slow, noisy	Use --changed-only in CI
Using scores for performance reviews	Gaming the system	Focus on trend improvement
Blocking merges on absolute scores	Discourages refactoring old code	Block on regression only
Ignoring context (test vs production)	False positives	Use --context flag
Not configuring thresholds	One-size-fits-all does not fit	Customize .qualityrc.json

Verification

After running assessment:

All 5 qualities scored for each symbol
Scores are 1-10 (not null or out of range)
Remediation links provided for low scores
Report format is valid (markdown/JSON/HTML)
Exit code matches assessment result
Historical data saved to .quality-cache/

Cohesion

How strongly related are responsibilities within a boundary?

High cohesion = focused, understandable code. Low cohesion = "god objects" doing too much.

Score	Description
10	Single, well-defined responsibility
7-9	Primary responsibility clear, minor supporting concerns
4-6	Multiple loosely related responsibilities
1-3	Unrelated responsibilities jammed together

Coupling

How dependent is this code on other code?

Loose coupling = independent evolution, easy testing. Tight coupling = fragile, hard to test.

Score	Description
10	Minimal dependencies, depends on abstractions
7-9	Few dependencies, all explicit
4-6	Moderate dependencies, some global state
1-3	Tightly coupled, hard-coded dependencies

Encapsulation

How well are implementation details hidden?

Good encapsulation = freedom to change internals. Poor encapsulation = brittle API.

Score	Description
10	All internals private, minimal public API
7-9	Mostly private, well-defined API
4-6	Some internals exposed
1-3	Everything public, no information hiding

Testability

How easily can behavior be verified in isolation?

Testable code = fast feedback, confidence to refactor. Untestable code = fear of change.

Score	Description
10	Pure functions, injected dependencies
7-9	Mostly testable, straightforward to mock
4-6	Moderately testable, requires setup
1-3	Hard to test, requires full integration

Non-Redundancy

How unique is each piece of knowledge?

DRY code = fix once, single source of truth. Duplication = fix N times, maintenance burden.

Score	Description
10	Zero duplication, appropriate abstractions
7-9	Minimal duplication (intentional)
4-6	Moderate duplication, missed abstractions
1-3	Pervasive copy-paste

Example 1: Single File Assessment

python3 scripts/assess.py --target src/models/user.py

Output:

# Code Quality Assessment: src/models/user.py

## Summary
- **Cohesion**: 8/10
- **Coupling**: 4/10 (warning)
- **Encapsulation**: 9/10
- **Testability**: 7/10
- **Non-Redundancy**: 9/10

## Issues Found

### Coupling: 4/10 (Warning)
**Problem**: Direct instantiation of DatabaseConnection in constructor

**Impact**: Hard to test, tightly coupled to database layer

**Remediation**: Use dependency injection
- See: [Dependency Injection](references/patterns/dependency-injection.md)
- Related ADR: ADR-023 (Dependency Management)

Example Fix:

# Before
class User:
    def __init__(self):
        self.db = DatabaseConnection()  # Hard-coded dependency

# After
class User:
    def __init__(self, db: DatabaseInterface):
        self.db = db  # Injected dependency

Example 2: CI Integration

# In CI pipeline
python3 scripts/assess.py --target . --changed-only --format json --output quality.json

# Exit code 10 = quality degraded, fail PR
# Exit code 0 = quality maintained, pass

Example 3: Full Codebase Report

python3 scripts/assess.py --target src/ --format html --output reports/quality.html

Opens dashboard showing:

Quality trends over time
Hot spots (lowest scoring files)
Improvement opportunities
Top refactoring priorities

With planner

# Identify refactoring targets
python3 scripts/assess.py --target src/ --format json | \
  jq '.files | sort_by(.overall) | .[0:5]' > low-quality-files.json

# Feed to planner
planner --input low-quality-files.json --goal "Refactor lowest quality files"

With adr-review

When reviewing ADRs, include quality impact:

# Before implementing ADR
python3 scripts/assess.py --target affected-files.txt > baseline.md

# After implementing ADR
python3 scripts/assess.py --target affected-files.txt > post-implementation.md

# Compare
diff baseline.md post-implementation.md

With analyze

Combine broad analysis with focused quality metrics:

# First: broad exploration
analyze --target src/

# Then: quality deep dive on problem areas
python3 scripts/assess.py --target src/services/auth.py

For detailed scoring methodology and examples:

References

File	Content
dotnet-performance-patterns.md	Allocation-free .NET patterns with quality scoring calibration

Language Support

Support Level	Languages
Full	Python (.py), TypeScript/JavaScript (.ts, .js, .tsx, .jsx), C# (.cs), Java (.java), Go (.go)
Partial (heuristic)	Ruby (.rb), Rust (.rs), PHP (.php), Kotlin (.kt)

Serena integration improves accuracy when available.

Design Philosophy

This skill embodies "sergeant methods directing privates":

Sergeant (assess.py): Orchestrates workflow, delegates to specialists
Privates (score_*.py): Focus on one quality each, report back

Each quality scorer is cohesive (single responsibility), loosely coupled (independent), and testable (pure calculation).

Timelessness: 9/10

These 5 qualities are computer science fundamentals:

Cohesion and coupling: 1970s (Parnas, Stevens)
Encapsulation: Core OOP principle (1960s)
Testability: TDD movement (1990s-2000s)
DRY: Pragmatic Programmer (1999)

Language-agnostic design ensures longevity across technology shifts.

Related skills

More from rjmurillo/ai-agents

Installs

Repository

rjmurillo/ai-agents

GitHub Stars

First Seen

12 days ago

Security Audits

Gen Agent Trust HubPass