code-qualities-assessment
Code Qualities Assessment
Evaluate code maintainability using 5 timeless design qualities with quantifiable scoring rubrics.
Triggers
assess code qualityevaluate maintainabilitycheck code qualitiestestability reviewrun quality assessment
Quick Start
# Assess a single file
python3 scripts/assess.py --target src/services/auth.py
# Assess changed files only (CI mode)
python3 scripts/assess.py --target . --changed-only --format json
# Full module assessment with HTML report
python3 scripts/assess.py --target src/services/ --format html --output quality-report.html
The 5 Code Qualities
| Quality | Question | Score 10 | Score 1-3 |
|---|---|---|---|
| Cohesion | How related are responsibilities? | Single, well-defined responsibility | Unrelated responsibilities jammed together |
| Coupling | How dependent on other code? | Minimal deps, depends on abstractions | Tightly coupled, hard-coded dependencies |
| Encapsulation | How well are internals hidden? | All internals private, minimal API | Everything public, no information hiding |
| Testability | How easily verified in isolation? | Pure functions, injected dependencies | Hard to test, requires full integration |
| Non-Redundancy | How unique is each piece of knowledge? | Zero duplication, appropriate abstractions | Pervasive copy-paste |
When to Use
Use this skill when:
- Reviewing code quality before merge
- Identifying refactoring priorities
- Establishing quality baselines
- Teaching code design principles
- Tracking quality trends over time
- Enforcing quality gates in CI
Use analyze instead when:
- Performing broad codebase investigation
- Security assessment is the focus
- Architecture review is needed
Process
The skill runs automated assessment via scripts/assess.py:
-
Symbol Extraction
- Detect language
- Use Serena (if available)
- Extract classes/methods
-
Quality Scoring
- Run 5 quality assessments
- Apply context rules (test vs prod)
- Aggregate symbol -> file -> module
-
Comparison (if historical data)
- Load previous scores
- Identify regressions/improvements
-
Report Generation
- Format: markdown, JSON, or HTML
- Include remediation guidance
- Link to refactoring patterns
-
Gate Enforcement (CI mode)
- Check thresholds
- Exit code: 0=pass, 10=degraded
Command Reference
Basic Usage
python3 scripts/assess.py --target <path> [options]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
--target |
Yes | - | File, directory, or glob pattern |
--context |
No | production | production, test, or generated |
--changed-only |
No | false | Only assess changed files (git diff) |
--format |
No | markdown | markdown, json, or html |
--config |
No | .qualityrc.json | Path to config file |
--output |
No | stdout | Output file path |
--use-serena |
No | auto | auto, yes, or no (Serena integration) |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Assessment complete, all thresholds met |
| 10 | Quality degraded vs previous run |
| 11 | Quality below configured thresholds |
| 1 | Script error (invalid args, file not found) |
Configuration
Create .qualityrc.json to customize thresholds:
{
"thresholds": {
"cohesion": { "min": 7, "warn": 5 },
"coupling": { "max": 3, "warn": 5 },
"encapsulation": { "min": 7, "warn": 5 },
"testability": { "min": 6, "warn": 4 },
"nonRedundancy": { "min": 8, "warn": 6 }
},
"context": {
"test": {
"testability": { "min": 3 }
}
},
"ignore": [
"**/generated/**",
"**/*.pb.py",
"**/migrations/**"
]
}
Anti-Patterns
| Avoid | Why | Instead |
|---|---|---|
| Running on entire codebase every commit | Slow, noisy | Use --changed-only in CI |
| Using scores for performance reviews | Gaming the system | Focus on trend improvement |
| Blocking merges on absolute scores | Discourages refactoring old code | Block on regression only |
| Ignoring context (test vs production) | False positives | Use --context flag |
| Not configuring thresholds | One-size-fits-all does not fit | Customize .qualityrc.json |
Verification
After running assessment:
- All 5 qualities scored for each symbol
- Scores are 1-10 (not null or out of range)
- Remediation links provided for low scores
- Report format is valid (markdown/JSON/HTML)
- Exit code matches assessment result
- Historical data saved to .quality-cache/
Cohesion
How strongly related are responsibilities within a boundary?
High cohesion = focused, understandable code. Low cohesion = "god objects" doing too much.
| Score | Description |
|---|---|
| 10 | Single, well-defined responsibility |
| 7-9 | Primary responsibility clear, minor supporting concerns |
| 4-6 | Multiple loosely related responsibilities |
| 1-3 | Unrelated responsibilities jammed together |
Coupling
How dependent is this code on other code?
Loose coupling = independent evolution, easy testing. Tight coupling = fragile, hard to test.
| Score | Description |
|---|---|
| 10 | Minimal dependencies, depends on abstractions |
| 7-9 | Few dependencies, all explicit |
| 4-6 | Moderate dependencies, some global state |
| 1-3 | Tightly coupled, hard-coded dependencies |
Encapsulation
How well are implementation details hidden?
Good encapsulation = freedom to change internals. Poor encapsulation = brittle API.
| Score | Description |
|---|---|
| 10 | All internals private, minimal public API |
| 7-9 | Mostly private, well-defined API |
| 4-6 | Some internals exposed |
| 1-3 | Everything public, no information hiding |
Testability
How easily can behavior be verified in isolation?
Testable code = fast feedback, confidence to refactor. Untestable code = fear of change.
| Score | Description |
|---|---|
| 10 | Pure functions, injected dependencies |
| 7-9 | Mostly testable, straightforward to mock |
| 4-6 | Moderately testable, requires setup |
| 1-3 | Hard to test, requires full integration |
Non-Redundancy
How unique is each piece of knowledge?
DRY code = fix once, single source of truth. Duplication = fix N times, maintenance burden.
| Score | Description |
|---|---|
| 10 | Zero duplication, appropriate abstractions |
| 7-9 | Minimal duplication (intentional) |
| 4-6 | Moderate duplication, missed abstractions |
| 1-3 | Pervasive copy-paste |
Example 1: Single File Assessment
python3 scripts/assess.py --target src/models/user.py
Output:
# Code Quality Assessment: src/models/user.py
## Summary
- **Cohesion**: 8/10
- **Coupling**: 4/10 (warning)
- **Encapsulation**: 9/10
- **Testability**: 7/10
- **Non-Redundancy**: 9/10
## Issues Found
### Coupling: 4/10 (Warning)
**Problem**: Direct instantiation of DatabaseConnection in constructor
**Impact**: Hard to test, tightly coupled to database layer
**Remediation**: Use dependency injection
- See: [Dependency Injection](references/patterns/dependency-injection.md)
- Related ADR: ADR-023 (Dependency Management)
Example Fix:
# Before
class User:
def __init__(self):
self.db = DatabaseConnection() # Hard-coded dependency
# After
class User:
def __init__(self, db: DatabaseInterface):
self.db = db # Injected dependency
Example 2: CI Integration
# In CI pipeline
python3 scripts/assess.py --target . --changed-only --format json --output quality.json
# Exit code 10 = quality degraded, fail PR
# Exit code 0 = quality maintained, pass
Example 3: Full Codebase Report
python3 scripts/assess.py --target src/ --format html --output reports/quality.html
Opens dashboard showing:
- Quality trends over time
- Hot spots (lowest scoring files)
- Improvement opportunities
- Top refactoring priorities
With planner
# Identify refactoring targets
python3 scripts/assess.py --target src/ --format json | \
jq '.files | sort_by(.overall) | .[0:5]' > low-quality-files.json
# Feed to planner
planner --input low-quality-files.json --goal "Refactor lowest quality files"
With adr-review
When reviewing ADRs, include quality impact:
# Before implementing ADR
python3 scripts/assess.py --target affected-files.txt > baseline.md
# After implementing ADR
python3 scripts/assess.py --target affected-files.txt > post-implementation.md
# Compare
diff baseline.md post-implementation.md
With analyze
Combine broad analysis with focused quality metrics:
# First: broad exploration
analyze --target src/
# Then: quality deep dive on problem areas
python3 scripts/assess.py --target src/services/auth.py
For detailed scoring methodology and examples:
- Cohesion Scoring
- Coupling Scoring
- Encapsulation Scoring
- Testability Scoring
- Non-Redundancy Scoring
- Calibration Examples
- Refactoring Patterns
References
| File | Content |
|---|---|
| dotnet-performance-patterns.md | Allocation-free .NET patterns with quality scoring calibration |
Language Support
| Support Level | Languages |
|---|---|
| Full | Python (.py), TypeScript/JavaScript (.ts, .js, .tsx, .jsx), C# (.cs), Java (.java), Go (.go) |
| Partial (heuristic) | Ruby (.rb), Rust (.rs), PHP (.php), Kotlin (.kt) |
Serena integration improves accuracy when available.
Design Philosophy
This skill embodies "sergeant methods directing privates":
- Sergeant (assess.py): Orchestrates workflow, delegates to specialists
- Privates (score_*.py): Focus on one quality each, report back
Each quality scorer is cohesive (single responsibility), loosely coupled (independent), and testable (pure calculation).
Timelessness: 9/10
These 5 qualities are computer science fundamentals:
- Cohesion and coupling: 1970s (Parnas, Stevens)
- Encapsulation: Core OOP principle (1960s)
- Testability: TDD movement (1990s-2000s)
- DRY: Pragmatic Programmer (1999)
Language-agnostic design ensures longevity across technology shifts.
More from rjmurillo/ai-agents
reflect
CRITICAL learning capture. Extracts HIGH/MED/LOW confidence patterns from conversations to prevent repeating mistakes and preserve what works. Use PROACTIVELY after user corrections ("no", "wrong"), after praise ("perfect", "exactly"), when discovering edge cases, or when skills are heavily used. Without reflection, valuable learnings are LOST forever. Acts as continuous improvement engine for all skills. Invoke EARLY and OFTEN - every correction is a learning opportunity.
14threat-modeling
Structured security analysis using OWASP Four-Question Framework and STRIDE methodology. Generates threat matrices with risk ratings, mitigations, and prioritization. Use for attack surface analysis, security architecture review, or when asking what can go wrong.
2github
Execute GitHub operations (PRs, issues, milestones, labels, comments, merges)
2analyze
Systematic multi-step codebase analysis producing prioritized findings with file-line evidence. Covers architecture reviews, security assessments, and code quality evaluations through guided exploration, investigation planning, and synthesis.
2pre-mortem
Guide prospective hindsight analysis to identify project risks before failure occurs. Teams imagine the project has failed spectacularly, then work backward to identify causes. Increases risk identification by 30% compared to traditional planning.
2adr-generator
Create comprehensive Architectural Decision Records (ADRs). Researches the destination directory to detect existing template conventions, gathers context, determines next ADR number, generates the ADR, validates completeness, and saves. Supports multiple ADR formats (MADR, Nygard, Alexandrian, project canonical). Use when documenting technical decisions or creating new ADR files.
1