extreme-software-optimization
Installation
SKILL.md
Extreme Software Optimization
The One Rule: Profile first. Prove behavior unchanged. One change at a time.
The Loop (Mandatory)
1. BASELINE → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE → cargo flamegraph / py-spy / clinic flame
3. PROVE → Golden outputs + isomorphism proof per change
4. IMPLEMENT → Score ≥ 2.0 only, one lever per commit
5. VERIFY → sha256sum -c golden_checksums.txt
6. REPEAT → Re-profile (bottlenecks shift)
Opportunity Matrix
| Hotspot | Impact (1-5) | Confidence (1-5) | Effort (1-5) | Score |
|---|---|---|---|---|
| func:line | × | × | ÷ | Impact×Conf/Effort |
Rule: Only implement Score ≥ 2.0
Isomorphism Proof Template
For EVERY change, document:
## Change: [description]
- Ordering preserved: [yes/no + why]
- Tie-breaking unchanged: [yes/no + why]
- Floating-point: [identical/N/A]
- RNG seeds: [unchanged/N/A]
- Golden outputs: sha256sum -c golden_checksums.txt ✓
Pattern Tiers (Quick Reference)
Tier 1: Low-Hanging Fruit
| Pattern | When | Isomorphism |
|---|---|---|
| N+1 → Batch | Sequential fetches | Same results, fewer round-trips |
| Linear → HashMap | Keyed lookups | O(n)→O(1), order may change |
| Lazy eval | Maybe-unused values | Same final values |
| Memoization | Repeated pure calls | Cached = recomputed |
| Buffer reuse | Alloc per iteration | Zero-copy in loop |
Tier 2: Algorithmic
| Pattern | Change | Check |
|---|---|---|
| Binary search | O(n)→O(log n) | Sorted input |
| Two-pointer | O(n²)→O(n) | Structured input |
| Prefix sums | O(n)→O(1) query | Static data |
| Priority queue | O(n)→O(log n) | Top-k/scheduling |
Tier 3: Data Structures
| Structure | Use Case |
|---|---|
| HashMap | Point lookups |
| BTreeMap | Range queries |
| SmallVec | Usually-small collections |
| Arena | Many allocations, bulk free |
| Bloom filter | Membership pre-filter |
Full catalog: TECHNIQUES.md
Language Cheatsheet
| Lang | CPU Profile | Trouble Spot Grep |
|---|---|---|
| Rust | cargo flamegraph |
rg '\.clone\(\)' --type rust |
| Go | go tool pprof /debug/pprof/profile |
rg 'interface\{\}' --type go |
| TS | clinic flame -- node app.js |
rg 'JSON\.(parse|stringify)' --type ts |
| Python | py-spy record -o flame.svg -- python script.py |
rg '\.iterrows\(\)' --type py |
Full language guides: LANGUAGE-SPECIFIC.md
Anti-Patterns (Never Do)
| ✗ | Why |
|---|---|
| Optimize without profiling | Wastes effort on non-hotspots |
| Multiple changes per commit | Can't isolate regressions |
| Assume improvement | Must measure before/after |
| Change behavior "while we're here" | Breaks isomorphism guarantee |
| Skip golden output capture | No regression detection |
Checklist (Before Any Optimization)
- Baseline captured (p50/p95/p99, throughput, memory)
- Profiled: hotspot in top 5 by % time
- Opportunity score ≥ 2.0
- Golden outputs saved
- Isomorphism proof written
- Single lever only
- Rollback plan:
git revert <sha>
Tool Commands
# Benchmark
hyperfine --warmup 3 --runs 10 'command'
# Profile
cargo flamegraph # Rust CPU
heaptrack ./binary # Allocation
strace -c ./binary # Syscalls
# Verify
sha256sum golden_outputs/* > golden_checksums.txt
sha256sum -c golden_checksums.txt # After changes
References
| Need | Reference |
|---|---|
| Complete technique catalog | TECHNIQUES.md |
| Step-by-step methodology | METHODOLOGY.md |
| Language-specific guides | LANGUAGE-SPECIFIC.md |
| Advanced (Round 2+) | ADVANCED.md |
Iteration Rounds
- Round 1: Standard (N+1, indexes, batching, memoization)
- Round 2: Algorithmic (DP, convex, semirings) → ADVANCED.md
- Round 3: Exotic (suffix automata, link-cut trees)
Each round: fresh profile → new hotspots → new matrix.
Weekly Installs
1
Repository
pedronauck/kodebase-goGitHub Stars
2
First Seen
4 days ago
Security Audits
Installed on
amp1
cline1
openclaw1
opencode1
cursor1
kimi-cli1