cc-data-organization
Skill: cc-data-organization
STOP - Priority 1: Never Skip
| Item | Why Critical |
|---|---|
| No magic numbers in business logic | Source of silent bugs |
| Currency uses integer cents, never float | Financial bugs are lawsuits |
| No float == comparisons | Non-deterministic failures |
| Variables initialized before use | Undefined behavior |
| Boolean naming is unambiguous | Logic inversion bugs |
Skipping Priority 1 items is NEVER acceptable. They represent latent defects that will manifest later.
Modes
CHECKER
Purpose: Execute data organization checklists against code Triggers:
- "review my variable declarations"
- "check for magic numbers"
- "review data type usage"
- "check my variable names" Non-Triggers:
- "what type should I use for X" -> APPLIER
- "how should I name this variable" -> APPLIER
- "fix these magic numbers" -> TRANSFORMER Checklist: See checklists.md Metrics: See hard-data.md for Span/Live Time measures (goal: minimize both) Output Format: | Item | Status | Evidence | Location | |------|--------|----------|----------| Severity:
- VIOLATION: Fails checklist item
- WARNING: Partial compliance
- PASS: Meets requirement
APPLIER
Purpose: Guide data type selection, variable naming, and structure design Triggers:
- "what data type should I use for..."
- "how should I name this variable"
- "best practice for enums/constants"
- "how should I organize this data" Non-Triggers:
- "review my types" -> CHECKER
- "fix this" -> TRANSFORMER
- "audit my code" -> CHECKER Produces: Type recommendations, naming conventions, enum patterns, constant definitions, structure designs Constraints:
- [p.308] Eliminate semantic literals - Replace business values (
86400,12,0.07) with named constants. Loop bounds0,1and array indices are typically fine. - [p.295] For currency: integer cents or BCD, never float
- [p.306] Enums (language-dependent):
- C/C++: Reserve 0 for invalid, define First/Last bounds
- TypeScript string enums: No zero-reservation needed (no uninitialized risk)
- Rust/Kotlin: Leverage exhaustive matching instead of bounds checks
- [p.259] Minimize scope: Declare variables in innermost block where all usages occur. Balance with testability—sometimes slightly wider scope enables testing.
- [p.263] Names describe the entity clearly: Reader should understand purpose without searching for definition. Examples:
d(bad) →data(vague) →userData(better) →validatedUserSubmission(good for complex entity) - [p.279] Problem Orientation: names refer to problem domain (employeeData, printerReady), not computing (inputRec, bitFlag)
- [p.263] Name length heuristic: 2-4 words, long enough to describe purpose, short enough to scan. Research shows 10-16 chars minimizes debugging effort [Gorla et al. 1990], but this is guidance, not a hard rule.
TRANSFORMER
Purpose: Fix data organization violations Triggers:
- CHECKER findings with VIOLATION status
- "replace magic numbers with constants"
- "fix float comparison"
- "refactor these globals" Non-Triggers:
- Large refactorings beyond data organization -> cc-refactoring-guidance
- Control flow restructuring -> cc-control-flow-quality Input -> Output:
- Magic
86400->SECONDS_PER_DAY = 86400 if (a == b)floats ->if (Math.abs(a-b) < EPSILON)true, false, trueparams -> enum values- Unstructured variables -> grouped structure
- Direct global access -> access routines Preserves: Behavior, unrelated code Verification: Re-run CHECKER; VIOLATION count = 0
Rationalization Counters
| Excuse | Reality |
|---|---|
| "Everyone knows what 12 means" | Named constants aid maintenance [Glass 1991] |
| "Floats are close enough for ==" | 0.1 added 10 times rarely equals 1.0 |
| "Magic numbers are faster to type" | Debugging hard-coded literals takes far longer |
| "I don't need custom types" | One typedef change vs hundreds of declarations |
| "Short names are faster to type" | Code read far more than written; favor read-time convenience |
| "Global variables are more convenient" | Convenience writing trades against difficulty reading, debugging, modifying |
Sunk Cost Counters
For resisting changes to "working" code:
| Excuse | Reality |
|---|---|
| "It works, why change it?" | Violations are latent defects; "works" means "hasn't failed yet" |
| "I already invested time in this" | Time invested in bad code is lost regardless; fix now or pay more later |
| "Refactoring will break things" | Violations already broken; you just haven't discovered how yet |
| "Currency has always used floats here" | Every penny calculation is a potential lawsuit |
| "We've had no bugs from these magic numbers" | You've had bugs—you attributed them to other causes |
| "The code passed review before" | Past reviews missed issues; evidence now shows violations |
Success-Bias Warning
Past success does NOT predict future safety.
Violations that "worked for years" fail when:
- Edge cases finally occur (currency rounding in new scenarios)
- Scale changes (global variable contention under load)
- Maintenance happens (magic numbers misunderstood by new developers)
- Requirements shift (hard-coded values need changing)
Every checklist item applies regardless of past success. "Worked until it didn't" examples fill bug databases.
Modern Data Types Coverage
Beyond Code Complete's C-era focus:
Concurrent Access
When data may be accessed from multiple threads/async contexts:
- Identify shared state - Mark variables accessed across thread boundaries
- Access routines are mandatory - Never expose shared data directly
- Consider immutability - Immutable data eliminates race conditions by design
- Document thread safety - Comment whether type/routine is thread-safe
- Violations: Data races, torn reads, lost updates
Nullable/Optional Types
Modern languages use Option<T>, Maybe, T? instead of null pointers:
- Prefer non-nullable by default - Make nullability explicit and intentional
- Handle all cases - Exhaustive matching on Option/Maybe types
- Avoid null as "not found" - Use Option types or result types instead
- Document null semantics - When null is valid, document what it means
- C-style pointer guidance still applies to unsafe code
Temporal Data
Dates and times are a common bug source:
- Store timestamps in UTC - Convert to local only for display
- Use timezone-aware types - Never use naive datetime for user-facing data
- Be explicit about precision - Seconds, milliseconds, nanoseconds?
- Name with time unit -
timeoutMs,durationSeconds, not justtimeout - Avoid magic time values -
86400→SECONDS_PER_DAY
Security-Sensitive Data
Secrets, tokens, API keys require special handling:
- Clear from memory after use - Don't leave secrets in variables longer than needed
- Never log sensitive data - Redact in all log statements
- Use dedicated types -
SecureString,SensitiveDatawrappers - Limit scope aggressively - Shortest possible lifetime
Chain
| After | Next |
|---|---|
| Data organization verified | cc-control-flow-quality (CHECKER) |