Performance Profiler

Analysis-based performance review. Every recommendation grounded in evidence. 6-mode pipeline: Analyze, Profile, Cache, Benchmark, Regression, Leak-Patterns.

Scope: Performance analysis and recommendations only. NOT for running profilers, executing load tests, infrastructure monitoring, or actual memory leak detection. This skill provides analysis-based guidance, not measurements.

Canonical Vocabulary

Use these terms exactly throughout all modes:

Term	Definition
complexity	Big-O algorithmic classification of a function or code path
hotspot	Code region with disproportionate resource consumption (time or memory)
bottleneck	System constraint limiting overall throughput
profiler output	Textual data from cProfile, py-spy, perf, or similar tools pasted by user
cache strategy	Eviction policy + write policy + invalidation approach for a caching layer
benchmark skeleton	Template code for measuring function performance with proper methodology
regression risk	Likelihood that a code change degrades performance, scored LOW/MEDIUM/HIGH/CRITICAL
anti-pattern	Known performance-harmful code pattern (N+1, unbounded allocation, etc.)
evidence	Concrete proof: AST analysis, profiler data, code pattern match, or external reference
recommendation	Actionable optimization suggestion with expected impact and trade-offs
flame graph	Hierarchical visualization of call stack sampling data
wall time	Elapsed real time (includes I/O waits) vs CPU time (compute only)

Dispatch

$ARGUMENTS	Mode
`analyze <file/function>`	Algorithmic complexity analysis, Big-O review
`profile <data>`	Interpret textual profiler output (cProfile, py-spy, perf)
`cache <system>`	Caching strategy design (LRU/LFU/TTL/write-through/write-back)
`benchmark <code>`	Benchmark design and methodology review
`regression <diff>`	Performance regression risk assessment from code diff
`leak-patterns`	Common memory leak pattern scan (NOT actual detection)
Empty	Show mode menu with examples for each mode

Mode 1: Analyze

Algorithmic complexity analysis for files or functions.

Analyze Step 1: Scan

Run the complexity estimator script:

uv run python skills/performance-profiler/scripts/complexity-estimator.py <path>

Parse JSON output. If script fails, perform manual AST-level analysis.

Analyze Step 2: Classify

For each function in scope:

Identify loop nesting depth, recursion patterns, data structure operations
Map to Big-O classification using references/complexity-patterns.md
Score hotspot risk: nesting depth * call frequency * data size sensitivity
Flag functions with O(n^2) or worse in hot paths

Analyze Step 3: Report

Present findings as a table:

Function	Estimated Complexity	Evidence	Hotspot Risk	Recommendation

Include trade-off analysis for each recommendation.

Mode 2: Profile

Interpret textual profiler output pasted by the user.

Profile Step 1: Parse

Run the profile parser script on user-provided data:

uv run python skills/performance-profiler/scripts/profile-parser.py --input <file>

If data is inline, save to temp file first. Parse JSON output.

Profile Step 2: Identify Hotspots

From parsed data:

Rank functions by cumulative time (top 10)
Identify functions with high call count but low per-call time (overhead candidates)
Identify functions with low call count but high per-call time (optimization candidates)
Check for I/O-bound vs CPU-bound patterns (wall time vs CPU time ratio)

Profile Step 3: Recommend

For each hotspot, provide:

Root cause hypothesis with evidence from the profiler data
Optimization approach with expected impact range
Trade-offs and risks of the optimization
Reference to relevant anti-patterns from references/anti-patterns.md

Mode 3: Cache

Design caching strategies for a described system.

Cache Step 1: Understand Access Patterns

Ask about or infer from code:

Read/write ratio
Data freshness requirements (TTL tolerance)
Cache size constraints
Consistency requirements (eventual vs strong)
Eviction pressure (working set vs cache capacity)

Cache Step 2: Design Strategy

Use references/caching-strategies.md decision tree:

Factor	LRU	LFU	TTL	Write-Through	Write-Back
Read-heavy, stable working set	Good	Best	OK	--	--
Write-heavy	--	--	--	Safe	Fast
Strict freshness	--	--	Best	Best	Risky
Memory-constrained	Best	Good	OK	--	--

Cache Step 3: Specify

Deliver: eviction policy, write policy, invalidation strategy, warm-up approach, monitoring recommendations. Include capacity planning formula.

Mode 4: Benchmark

Design benchmarks and review methodology.

Benchmark Step 1: Generate Skeleton

Run the benchmark designer script:

uv run python skills/performance-profiler/scripts/benchmark-designer.py --function <signature> --language <lang>

Parse JSON output for setup code, benchmark code, iterations, warmup.

Benchmark Step 2: Review Methodology

Validate against benchmark best practices:

Warmup period sufficient to stabilize JIT/caches
Iteration count provides statistical significance
Measurement excludes setup/teardown overhead
Environment controlled (no interference from other processes)
Results include variance/percentiles, not just mean

Benchmark Step 3: Deliver

Provide complete benchmark code with methodology notes, expected metrics, and interpretation guide.

Mode 5: Regression

Assess performance regression risk from a code diff.

Regression Step 1: Collect Diff

If path provided, read the diff. If git range provided, run git diff. Identify changed functions and their call sites.

Regression Step 2: Assess Risk

For each changed function:

Risk Factor	Weight	Check
Complexity increase	3x	Loop nesting added, algorithm changed
Hot path change	3x	Function called in request/render path
Data structure change	2x	Collection type or size assumptions changed
I/O pattern change	2x	New network/disk calls, removed batching
Memory allocation	1x	New allocations in loops, larger buffers

Risk score = sum of (weight * severity). Map to LOW/MEDIUM/HIGH/CRITICAL.

Regression Step 3: Report

Present regression risk matrix with:

Per-function risk assessment with evidence
Aggregate risk score for the diff
Recommended benchmark targets before merging
Specific measurements to validate (what to profile and where)

Mode 6: Leak-Patterns

Scan for common memory leak patterns. Static analysis only -- NOT actual leak detection.

Leak Step 1: Scan

Read target files and check against patterns in references/leak-patterns.md:

Event listener accumulation without cleanup
Closure-captured references preventing GC
Growing collections without bounds (unbounded caches, append-only lists)
Circular references in reference-counted languages
Resource handles not closed (files, connections, cursors)
Global state accumulation

Leak Step 2: Classify

For each potential leak pattern found:

Pattern	Language	Severity	False Positive Risk

Leak Step 3: Report

Present findings with code citations, explain why each pattern risks leaking, and suggest fixes. Acknowledge that static analysis has high false positive rates -- recommend actual profiling tools for confirmation.

Scaling Strategy

Scope	Strategy
Single function	Direct analysis, inline report
Single file (< 500 LOC)	Script-assisted analysis, structured report
Multiple files / module	Parallel subagents per file, consolidated report
Full codebase	Prioritize entry points and hot paths, sample-based analysis

Reference Files

Load ONE reference at a time. Do not preload all references into context.

File	Content	Read When
`references/complexity-patterns.md`	Code pattern to Big-O mapping with examples	Mode 1 (Analyze)
`references/caching-strategies.md`	Caching decision tree, eviction policies, trade-offs	Mode 3 (Cache)
`references/anti-patterns.md`	Performance anti-patterns catalog (N+1, unbounded alloc, etc.)	Mode 2 (Profile), Mode 5 (Regression), Mode 6 (Leak)
`references/leak-patterns.md`	Memory leak patterns by language (Python, JS, Go, Java)	Mode 6 (Leak-Patterns)
`references/profiler-guide.md`	Profiler output interpretation, flame graph reading	Mode 2 (Profile)
`references/benchmark-methodology.md`	Benchmark design best practices, statistical methods	Mode 4 (Benchmark)

Script	When to Run
`scripts/complexity-estimator.py`	Mode 1 — static complexity analysis via AST
`scripts/profile-parser.py`	Mode 2 — parse cProfile/pstats textual output to JSON
`scripts/benchmark-designer.py`	Mode 4 — generate benchmark skeleton from function signature

Template	When to Render
`templates/dashboard.html`	After any mode — inject results JSON into data tag

Data Files

File	Content
`data/complexity-patterns.json`	Code pattern to Big-O mapping (machine-readable)
`data/caching-strategies.json`	Caching decision tree (machine-readable)
`data/anti-patterns.json`	Performance anti-patterns catalog (machine-readable)

Critical Rules

Never claim to measure performance — this skill provides analysis, not measurement
Every recommendation must include trade-offs — no "just do X" advice
Always acknowledge uncertainty in complexity estimates — static analysis has limits
Never recommend premature optimization — confirm the code is actually on a hot path first
Profiler output interpretation must cite specific data points, not general principles
Cache strategy recommendations must address invalidation — "cache invalidation is hard" is not a strategy
Benchmark designs must include warmup, statistical significance, and variance reporting
Regression risk assessment must trace to specific code changes, not general concerns
Leak pattern scanning is pattern-matching only — always recommend actual profiling for confirmation
Load ONE reference file at a time — do not preload all references into context
Present findings with evidence before suggesting fixes (approval gate)
Anti-pattern findings require code citation [file:line] — no generic warnings

performance-profiler