extreme-software-optimization
Extreme Software Optimization
The One Rule: Profile first. Prove behavior unchanged. One change at a time.
The Loop (Mandatory)
1. BASELINE → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE → cargo flamegraph / py-spy / clinic flame
3. PROVE → Golden outputs + isomorphism proof per change
4. IMPLEMENT → Score ≥ 2.0 only, one lever per commit
5. VERIFY → sha256sum -c golden_checksums.txt
6. REPEAT → Re-profile (bottlenecks shift)
Opportunity Matrix
| Hotspot | Impact (1-5) | Confidence (1-5) | Effort (1-5) | Score |
|---|---|---|---|---|
| func:line | × | × | ÷ | Impact×Conf/Effort |
Rule: Only implement Score ≥ 2.0
Isomorphism Proof Template
For EVERY change, document:
## Change: [description]
- Ordering preserved: [yes/no + why]
- Tie-breaking unchanged: [yes/no + why]
- Floating-point: [identical/N/A]
- RNG seeds: [unchanged/N/A]
- Golden outputs: sha256sum -c golden_checksums.txt ✓
Pattern Tiers (Quick Reference)
Tier 1: Low-Hanging Fruit
| Pattern | When | Isomorphism |
|---|---|---|
| N+1 → Batch | Sequential fetches | Same results, fewer round-trips |
| Linear → HashMap | Keyed lookups | O(n)→O(1), order may change |
| Lazy eval | Maybe-unused values | Same final values |
| Memoization | Repeated pure calls | Cached = recomputed |
| Buffer reuse | Alloc per iteration | Zero-copy in loop |
Tier 2: Algorithmic
| Pattern | Change | Check |
|---|---|---|
| Binary search | O(n)→O(log n) | Sorted input |
| Two-pointer | O(n²)→O(n) | Structured input |
| Prefix sums | O(n)→O(1) query | Static data |
| Priority queue | O(n)→O(log n) | Top-k/scheduling |
Tier 3: Data Structures
| Structure | Use Case |
|---|---|
| HashMap | Point lookups |
| BTreeMap | Range queries |
| SmallVec | Usually-small collections |
| Arena | Many allocations, bulk free |
| Bloom filter | Membership pre-filter |
Full catalog: TECHNIQUES.md
Language Cheatsheet
| Lang | CPU Profile | Trouble Spot Grep |
|---|---|---|
| Rust | cargo flamegraph |
rg '\.clone\(\)' --type rust |
| Go | go tool pprof /debug/pprof/profile |
rg 'interface\{\}' --type go |
| TS | clinic flame -- node app.js |
rg 'JSON\.(parse|stringify)' --type ts |
| Python | py-spy record -o flame.svg -- python script.py |
rg '\.iterrows\(\)' --type py |
Full language guides: LANGUAGE-SPECIFIC.md
Anti-Patterns (Never Do)
| ✗ | Why |
|---|---|
| Optimize without profiling | Wastes effort on non-hotspots |
| Multiple changes per commit | Can't isolate regressions |
| Assume improvement | Must measure before/after |
| Change behavior "while we're here" | Breaks isomorphism guarantee |
| Skip golden output capture | No regression detection |
Checklist (Before Any Optimization)
- Baseline captured (p50/p95/p99, throughput, memory)
- Profiled: hotspot in top 5 by % time
- Opportunity score ≥ 2.0
- Golden outputs saved
- Isomorphism proof written
- Single lever only
- Rollback plan:
git revert <sha>
Tool Commands
# Benchmark
hyperfine --warmup 3 --runs 10 'command'
# Profile
cargo flamegraph # Rust CPU
heaptrack ./binary # Allocation
strace -c ./binary # Syscalls
# Verify
sha256sum golden_outputs/* > golden_checksums.txt
sha256sum -c golden_checksums.txt # After changes
References
| Need | Reference |
|---|---|
| Complete technique catalog | TECHNIQUES.md |
| Step-by-step methodology | METHODOLOGY.md |
| Language-specific guides | LANGUAGE-SPECIFIC.md |
| Advanced (Round 2+) | ADVANCED.md |
Iteration Rounds
- Round 1: Standard (N+1, indexes, batching, memoization)
- Round 2: Algorithmic (DP, convex, semirings) → ADVANCED.md
- Round 3: Exotic (suffix automata, link-cut trees)
Each round: fresh profile → new hotspots → new matrix.
More from compozy/kb
kb
Comprehensive skill for the `kb` CLI and the Karpathy Knowledge Base pattern. Covers the full KB lifecycle — topic scaffolding, multi-source ingestion (URLs, files, YouTube, bookmarks, codebases), wiki article compilation, cross-article querying with file-back, lint-and-heal passes, QMD indexing, and hybrid search. Also covers codebase-specific analysis via inspect commands for complexity, coupling, blast radius, dead code, circular dependencies, symbol/file lookups, backlinks, and code smells. Use when working with kb CLI commands, knowledge base workflows, code vault generation, code graph analysis, code metrics inspection, wiki compilation, or the ingest-compile-query-lint cycle. Do not use for general code review, linting, formatting, building Go projects, or writing application code.
14systematic-qa
Executes full-project QA like a real user by discovering the repository verification contract, running build, lint, test, and startup commands, exercising core workflows end-to-end, creating realistic fixtures when needed, fixing root-cause regressions, and rerunning the full gate. Use when validating a branch, release candidate, migration, refactor, or risky commit. Do not use for static code review only, one-off unit test edits, or architecture brainstorming without execution.
1cy-create-tasks
Decomposes PRDs and TechSpecs into detailed, independently implementable task files with enrichment from codebase exploration. Use when a PRD or TechSpec exists and needs to be broken down into executable tasks, or when task files need enrichment with implementation context. Do not use for PRD creation, TechSpec generation, or direct task execution.
1find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
1lesson-learned
Analyze recent code changes via git history and extract software engineering lessons. Use when the user asks 'what is the lesson here?', 'what can I learn from this?', 'engineering takeaway', 'what did I just learn?', 'reflect on this code', or wants to extract principles from recent work.
1git-rebase
Intelligently handle git rebase operations and resolve merge conflicts while preserving features and maintaining code quality. Use when rebasing feature branches, resolving conflicts across commits, and ensuring clean linear history without losing changes.
1