cc-performance-tuning
Skill: cc-performance-tuning
STOP - Measure First
- Don't optimize based on intuition—profile first
- Correctness before speed - Make it work, then make it fast
- <4% of code causes >50% of runtime - Find the hot spot before touching anything
CRITICAL: Even In Emergencies
Production down? Losing money? User panicking? Especially then:
- Guessing wrong costs MORE time than 60-second profiling
- Multiple shotgun changes make rollback impossible
- Wrong "fix" can mask the real problem for days
Minimum crisis protocol (non-negotiable):
- 60-second profiler check OR recent deployment/config check
- ONE change at a time
- Revert immediately if no improvement within 5 minutes
Scope & Limitations
This skill covers: Single-threaded, single-process code tuning for general-purpose computing.
NOT covered (need specialized guidance):
- Concurrency: Lock contention often dominates; profile thread states, not just CPU
- Distributed systems: Network latency ~10,000x memory; optimize RPC/serialization first
- Real-time systems: Need worst-case latency, not average; caching adds variance
- Embedded/constrained: Memory/power budgets require different tradeoffs
Quick Reference
| Threshold/Rule | Value | Source |
|---|---|---|
| Hot spot concentration | <4% causes >50% runtime | Knuth 1971 |
| Failed optimization rate | >50% produce negligible or negative results | p.607 |
| Compiler optimization gains | 40-59% improvement possible | p.596 |
| Interpreted vs compiled | PHP/Python >100x slower than C++ | Table 25-1 |
| I/O vs memory | ~1000x difference | p.591 |
Key Principles:
- Make it correct first, then make it fast
- Measure before AND after every optimization
- Profile to find hot spots; never guess
- Compiler optimization often beats manual tuning
- Code tuning is the LAST resort, not first
Core Patterns
PREREQUISITE: Only apply these patterns AFTER profiling confirms the specific code is in the <4% hot path. Applying without measurement is a skill violation.
Page Fault Loop Ordering
// BEFORE: Causes page fault on every access [ANTI-PATTERN]
for (column = 0; column < MAX_COLUMNS; column++) {
for (row = 0; row < MAX_ROWS; row++) {
table[row][column] = BlankTableElement();
}
}
// AFTER: Page fault only when switching rows (up to 1000x faster)
for (row = 0; row < MAX_ROWS; row++) {
for (column = 0; column < MAX_COLUMNS; column++) {
table[row][column] = BlankTableElement();
}
}
Sentinel Value in Search Loop
// BEFORE: Compound test every iteration [ANTI-PATTERN]
found = false;
i = 0;
while (!found && i < count) {
if (item[i] == target) found = true;
i++;
}
// AFTER: Single test per iteration (23-65% faster)
// Place sentinel past end of search range
item[count] = target; // sentinel
i = 0;
while (item[i] != target) {
i++;
}
if (i < count) {
// found at position i
}
Loop Unswitching
// BEFORE: Testing invariant condition every iteration [ANTI-PATTERN]
for (i = 0; i < count; i++) {
if (type == TYPE_A) {
processTypeA(item[i]);
} else {
processTypeB(item[i]);
}
}
// AFTER: Test once outside loop (19-28% faster)
if (type == TYPE_A) {
for (i = 0; i < count; i++) {
processTypeA(item[i]);
}
} else {
for (i = 0; i < count; i++) {
processTypeB(item[i]);
}
}
Strength Reduction in Expression
// BEFORE: Expensive operation [ANTI-PATTERN]
if (Math.sqrt(x) < Math.sqrt(y)) {
// ...
}
// AFTER: Algebraically equivalent, 90-99.9% faster
if (x < y) { // when x,y >= 0
// ...
}
APPLIER: When to Optimize
Decision Tree (STRICT ORDER - Do NOT skip steps):
Each step is a gate. Skipping steps = wasted effort or masked problems.
1. Is the program correct and complete?
NO -> Make it correct first. STOP optimization.
YES -> Continue
2. Have you measured to find the actual bottleneck?
NO -> Profile/measure first. Do NOT guess.
YES -> Continue
3. Can requirements be relaxed?
YES -> Relax requirements. Done.
NO -> Continue
4. Can design/architecture solve it?
YES -> Fix design. Done.
NO -> Continue
5. Can algorithm/data structure solve it?
YES -> Change algorithm. Done.
NO -> Continue
6. Can compiler flags help?
YES -> Enable optimizations. Measure.
NO -> Continue
7. Is it in the <4% that causes >50% of runtime?
NO -> Do NOT optimize this code. Find actual hot spot.
YES -> PROCEED with code tuning
Code Tuning Procedure (STRICT ORDER)
1. Save working version (cannot revert without backup)
2. Make ONE change (multiple changes = unmeasurable)
3. Measure improvement (same workload, before/after)
4. Keep if faster, revert if not (no "close enough")
5. Repeat
Technique Priority (by category)
Logic:
- Stop testing when answer known (use break, short-circuit)
- Order tests by frequency (most common first)
- Substitute table lookups for complex logic
- Use lazy evaluation
Loops:
- Unswitch (move invariant tests outside)
- Jam/fuse loops operating on same range
- Put busiest loop on inside
- Minimize work inside loops
- Use sentinel values for search loops
- Unroll ONLY if measured (can be -27% in Python!)
Data:
- Use integers instead of floating-point when possible
- Use fewest array dimensions
- Cache frequently computed values
- Precompute results where practical
Expressions:
- Initialize at compile time
- Exploit algebraic identities
- Use strength reduction (multiplication -> addition)
- Eliminate common subexpressions
CHECKER: Review for Anti-Patterns
Checklist: checklists.md Output Format:
| Item | Status | Evidence | Location |
|---|---|---|---|
| Measured before tuning? | VIOLATION | No profiler/measurement found | N/A |
| Loop unswitching opportunity | WARNING | Invariant if (debug) inside loop |
app.py:142 |
| Severity: |
- VIOLATION: Clear anti-pattern present
- WARNING: Potential issue (needs measurement)
- PASS: No obvious performance issues
Rationalization Counters
| Excuse | Reality |
|---|---|
| "Fewer lines of code is faster" | No predictable relationship; unrolled loops often 60-74% faster [p.603] |
| "I know this operation is slow" | You must measure; rules change with every environment change |
| "I'll optimize as I go" | You'll spend 96% of time on code that doesn't matter [Pareto] |
| "Experience tells me where bottlenecks are" | No programmer has ever predicted bottlenecks without data [Newcomer] |
| "This clever trick will be faster" | Compilers optimize straightforward code better than tricky code [p.596] |
| "We need to rewrite in assembler now" | Usually <4% of code causes >50% of runtime; find it first [Knuth 1971] |
| "Fast code is as important as correct code" | Correct first, fast second. Always. |
| "I already optimized this; it will stay optimized" | Re-profile after any compiler/library/environment change |
| "This optimization always works" | Results vary wildly by language; Python -27% for loop unrolling [p.623] |
| "The theory says this should be faster" | Theory doesn't always hold; Visual Basic -94% on polynomial strength reduction [p.636] |
| "I don't need to profile small changes" | If not worth profiling, not worth degrading readability for [p.609] |
| Crisis: "We're losing $X/minute!" | Guessing wrong = paying that $X until you find real cause. 60-sec profile saves hours. |
| Crisis: "No time to profile!" | Wrong guess costs more time than profiling. Panic causes cascading errors. |
| Sunk cost: "I already spent 4 hours optimizing" | Time invested doesn't validate method. Revert all, apply with measurement. |
| Sunk cost: "It seems faster now" | "Seems faster" is not data. You may have made some faster, others slower. |
| Success streak: "I've been right 5 times" | Past success doesn't change physics. Calibration illusion: 5 wins don't predict win 6. |
Chain
| After | Next |
|---|---|
| Optimization complete | Verify design not degraded |
| Structure degraded | cc-refactoring-guidance |
More from ryanthedev/code-foundations
cc-defensive-programming
Use when auditing defensive code, designing barricades, choosing assertion vs error handling, or deciding correctness vs robustness strategy. Triggers on: empty catch blocks, missing input validation, assertions with side effects, wrong exception abstraction level, garbage in garbage out mentality, deadline pressure to skip validation, trusted source rationalization.
27building
Execute whiteboard plans through gated phases with subagent dispatch. Require feature branch. Each phase goes through PRE-GATE (discovery + pseudocode) -> IMPLEMENT -> POST-GATE (reviewer) -> CHECKPOINT. Produce per-phase commits, execution log, and working code with tests. Use after /code-foundations:whiteboarding to implement saved plans. Triggers on: build it, execute plan, implement the whiteboard, run the plan.
1cc-debugging
Guide systematic debugging using scientific method: STABILIZE -> HYPOTHESIZE -> EXPERIMENT -> FIX -> TEST -> SEARCH. Two modes: CHECKER audits debugging approach (outputs status table with violations/warnings), APPLIER guides when stuck (outputs stabilization strategy, hypothesis formation, fix verification). Use when encountering ANY bug, error, test failure, crash, wrong output, flaky behavior, race condition, regression, timeout, hang, or code behavior differing from intent. Triggers on: debug, fix, broken, failing, investigate, figure out why, not working, it doesn't work, something's wrong.
1whiteboarding
Brainstorm and plan features through codebase search, technology research, and 2-3 approach comparison before producing implementation-ready plans. Use when starting features, designing solutions, or planning complex work. Triggers on: whiteboard, let's plan, brainstorm, design this, figure out how to build. Save plans to docs/plans/ for execution via /code-foundations:building.
1prototype
Validate technical feasibility with minimum code before full implementation. Prove ONE atomic question ('Can I X?') through 6-phase workflow: SCOPE, CONTEXT, MINIMUM, EXECUTE, VERIFY, CAPTURE. Use when facing technical uncertainty, unsure if something is possible, or need proof before planning. Triggers on: prototype, POC, prove this works, spike, demo this, can I do X, is it possible, feasibility check. Produce prototype log in docs/prototypes/ with YES/NO/PARTIAL verdict and chain to whiteboarding.
1setup-ast
Configure tree-sitter CLI and language grammars for AST-powered code review. Use when AST extraction fails, tree-sitter not found, grammars missing, or setting up new machine. Triggers on: setup tree-sitter, install grammars, AST not working, tree-sitter not found, setup ast.
1