Human Taste: Code

Evaluate code and software design through human taste -- the trained judgment that detects whether abstractions are right-sized, complexity is managed, and the system will be cheap to change.

This complements the UX-focused human-taste skill. For full research citations see references/research-sources.md.

Why This Matters

LLM-generated code is functional but measurably lower in design quality. Studies show 42-85% more code smells in AI-generated code compared to human-written code. Human taste for maintainability, abstraction quality, and structural elegance is what separates code that works from code that lasts.

Key insight: taste in code is not aesthetic preference -- it is the ability to anticipate future change cost and act on that foresight now.

Quick Start

When asked to evaluate code:

Identify scope -- single function, module, class hierarchy, or system architecture
Run the rubric below across all six dimensions
Produce a Human Taste: Code Report using the output template
Cite specific code -- reference actual lines, names, and patterns

Evaluation Rubric

Score each dimension 1-5. Anchor every score with concrete evidence from the code.

1. Abstraction Depth (weight: high)

Are modules deep -- simple interface, rich functionality hidden behind it?

Score	Meaning
1	Shallow -- classes/functions expose implementation details; interface is as complex as internals
2	Leaky -- abstractions exist but callers need to know how things work inside
3	Adequate -- most modules hide internals, some leak
4	Deep -- simple interfaces hide substantial complexity (Unix file I/O pattern)
5	Elegant -- abstractions feel inevitable; you cannot imagine a simpler interface

Look for: interface-to-implementation ratio, information hiding, whether callers need internal knowledge, general-purpose vs over-specialized APIs.

2. Conceptual Integrity (weight: high)

Does the codebase feel like one mind designed it?

Score	Meaning
1	Fragmented -- multiple conflicting patterns, naming conventions, and styles
2	Inconsistent -- some unified areas but noticeable clashes
3	Mostly consistent -- follows conventions with occasional drift
4	Cohesive -- one clear style, one approach to common problems
5	Unified -- every part reinforces the same design philosophy

Look for: naming consistency, error handling patterns, data flow conventions, one way to do common things vs many.

3. Change Cost (weight: high)

How expensive will it be to modify this code in six months?

Score	Meaning
1	Brittle -- any change risks cascading failures; high coupling everywhere
2	Rigid -- changes require touching many files; dependencies are tangled
3	Manageable -- most changes are localized but some require careful coordination
4	Flexible -- clear boundaries; changes stay contained in their module
5	Supple -- designed for change; new requirements slot in naturally

Look for: coupling between modules, dependency direction, use of interfaces/protocols, feature toggles, test coverage of boundaries.

4. Simplicity (weight: medium)

Is the code as simple as the problem allows -- and no simpler?

Score	Meaning
1	Over-engineered -- abstractions for hypothetical futures; patterns for pattern's sake
2	Complex -- more indirection than the problem demands
3	Balanced -- complexity matches problem complexity
4	Clean -- direct solutions; easy to trace logic flow
5	Minimal -- nothing to remove; every line earns its place

Look for: premature abstraction, unused generality, configuration surface area, inheritance depth vs composition, "astronaut architecture."

5. Readability (weight: medium)

Can a new team member understand this code without an oral tradition?

Score	Meaning
1	Opaque -- requires significant effort to understand basic flow
2	Dense -- understandable with effort but easy to misread
3	Clear -- straightforward logic, reasonable naming
4	Transparent -- intent is obvious; naming tells the story
5	Self-documenting -- reads like well-written prose; no surprises

Look for: naming precision, function length, nesting depth, comment quality (explains why, not what), consistent formatting.

6. Robustness (weight: low)

Does the code handle the real world -- not just the happy path?

Score	Meaning
1	Fragile -- crashes on unexpected input; no error handling
2	Weak -- some error handling but inconsistent; edge cases ignored
3	Adequate -- common errors handled; some gaps
4	Solid -- errors handled consistently; graceful degradation
5	Resilient -- anticipates failure; recovers cleanly; observable

Look for: input validation, error propagation strategy, timeout handling, null/undefined safety, logging, retry logic.

Output Template

Produce your evaluation in this format:

# Human Taste: Code Report

**Subject:** [what was evaluated -- file, module, system]
**Language:** [primary language]
**Date:** [date]
**Overall Score:** [weighted average, 1-5, one decimal] / 5

## Scores

| Dimension | Score | Key Evidence |
|-----------|-------|-------------|
| Abstraction Depth | X/5 | [specific observation with code reference] |
| Conceptual Integrity | X/5 | [specific observation] |
| Change Cost | X/5 | [specific observation] |
| Simplicity | X/5 | [specific observation] |
| Readability | X/5 | [specific observation] |
| Robustness | X/5 | [specific observation] |

## Strengths
- [concrete strength citing specific code]
- [concrete strength citing specific code]

## Issues
- **[severity: Critical/Major/Minor]**: [specific issue] -- [why it harms long-term quality] -- [suggested refactor]

## Verdict
[2-3 sentences: what works, what does not, and the single highest-impact refactor]

Weighted average formula: (AbstractionDepth*3 + ConceptualIntegrity*3 + ChangeCost*3 + Simplicity*2 + Readability*2 + Robustness*1) / 14

Comparing Implementations

When comparing two approaches:

Run the rubric on each independently
Add a Comparison Table with side-by-side scores
Identify which approach wins on change cost specifically -- this is usually the deciding factor
Note tradeoffs honestly -- sometimes the "uglier" code is the right choice for the constraint

Reviewing AI-Generated Code

AI-generated code has specific taste failure modes:

Shallow modules -- many small functions/classes that just pass data through without hiding complexity
Over-abstraction -- interface + abstract class + factory + builder for a problem that needs one function
Inconsistent error handling -- some functions throw, some return nulls, some use result types in the same codebase
Copy-paste variation -- similar but slightly different implementations of the same pattern
Missing edge cases -- happy path works perfectly; error paths are afterthoughts
Naming theater -- verbose names that sound precise but don't help (AbstractSingletonProxyFactoryBean)

Flag these explicitly when you detect them.

When Not to Use This Skill

UX/visual design evaluation (use the human-taste skill instead)
Writing/content quality (use the human-taste-content skill)
Pure performance optimization (taste is about design, not benchmarks)
Style-only reviews (formatting, linting -- those are automated)

Additional Resources

For full research citations and sources, see references/research-sources.md
For worked examples of the rubric in action, see examples.md

human-taste-code