perf-test
Performance Test
Set up, run, and analyze performance tests with an iterative improvement cycle.
Usage
/perf-test src/utils/parser.ts # Profile a file/module
/perf-test calculateTotal # Profile a specific function
/perf-test /api/users # Load test an endpoint
/perf-test checkout-flow # E2E scenario test
/perf-test # Ask what to test
Gotchas
- First runs include JIT/cold-start overhead. Always configure warmup passes in the proposal — without them, baseline measurements are inflated and improvement percentages are misleading.
- Before/after comparisons are invalid if run on different machines, different concurrency levels, or with different iteration counts. Lock test configuration between runs.
Workflow
Phase 1: Understanding
Ask clarifying questions:
-
What type of test?
- Code profiling (execution time, memory)
- Load testing (requests/sec, latency, concurrency)
- E2E scenario (full workflow with state setup)
-
What metrics matter?
- Execution time / latency
- Throughput (ops/sec, requests/sec)
- Memory usage
- P50/P95/P99 latencies
-
What are the success criteria?
- Specific thresholds (e.g., <100ms p95)
- Relative improvement (e.g., 20% faster)
- No regressions from baseline
-
Detect project context:
# Detect language/framework ls package.json 2>/dev/null && echo "node" ls requirements.txt pyproject.toml 2>/dev/null && echo "python" ls go.mod 2>/dev/null && echo "go" ls Cargo.toml 2>/dev/null && echo "rust"
Do NOT skip phases:
- "I can skip the proposal and just write the test" -- The proposal ensures we measure the right things
- "The results look fine, no need to analyze" -- Subtle regressions hide in apparently fine results
- "One iteration is enough" -- Statistical significance requires multiple runs
Phase 2: Proposal
Present a testing plan:
## Performance Test Proposal
**Target:** {target}
**Type:** {profiling | load testing | e2e scenario}
**Tool:** {selected tool}
### Metrics to Collect
- {metric 1}
- {metric 2}
### Test Configuration
- {iterations/duration}
- {concurrency if load test}
- {warmup settings}
### Success Criteria
- {threshold 1}
- {threshold 2}
### Files to Create
- {test file path}
- {config file if needed}
Proceed with this plan?
Phase 3: Implementation
Select the appropriate template from references/templates.md based on language and test type:
- Code profiling: JS (benchmark.js), Python (timeit), Go (testing.B)
- Load testing: hey (universal), autocannon (Node.js)
- E2E scenarios: Custom scripts with state setup/teardown and per-step timing
Adapt the template to the specific target, configuring iterations, concurrency, and metrics collection as specified in the proposal.
Phase 4: Execution
Run tests and capture output:
# Node.js benchmark
node perf-tests/{target}.bench.js
# Python benchmark
python perf-tests/{target}_bench.py
# Go benchmark
go test -bench=. -benchmem ./...
# Load test
hey -n 1000 -c 50 {url}
Phase 5: Analysis
Parse results and identify concerns:
## Performance Results
### Summary
| Metric | Value | Threshold | Status |
|--------|-------|-----------|--------|
| Mean latency | 45ms | <100ms | Pass |
| P95 latency | 120ms | <150ms | Pass |
| P99 latency | 450ms | <200ms | Fail |
| Throughput | 850 req/s | >500 | Pass |
### Concerning Findings
1. **P99 latency spike** - 450ms exceeds 200ms threshold
- Likely cause: {analysis}
- Affected code: {file:line}
2. **Memory growth** - {if detected}
Phase 6: Recommendations
## Optimization Recommendations
### High Impact
1. **{Recommendation}**
- Current: {metric}
- Expected improvement: {estimate}
- Implementation: {brief description}
### Medium Impact
2. **{Recommendation}**
...
Which recommendations would you like to implement?
Phase 7: Improvement Cycle
After user selects recommendations:
- Implement the accepted optimizations
- Re-run the same performance tests
- Compare results:
## Before/After Comparison
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Mean latency | 45ms | 32ms | -29% |
| P95 latency | 120ms | 85ms | -29% |
| P99 latency | 450ms | 150ms | -67% |
| Throughput | 850/s | 1100/s | +29% |
### Summary
- {X} metrics improved
- {Y} metrics unchanged
- {Z} metrics regressed (if any)
- Ask if further optimization needed
Documentation
Save test setup for future replication:
File: docs/perf/{target}/README.md
# Performance Tests: {target}
## Setup
{installation instructions}
## Running Tests
{command to run}
## Baseline Metrics
| Metric | Value | Date |
|--------|-------|------|
| {metric} | {value} | {date} |
## Thresholds
- {threshold 1}
- {threshold 2}
## History
| Date | Change | Impact |
|------|--------|--------|
| {date} | {optimization} | {improvement %} |
Tool Reference
| Language | Profiling | Load Testing | E2E Scenario |
|---|---|---|---|
| JS/TS | benchmark.js, console.time | autocannon, k6 | custom scripts, k6 scenarios |
| Python | timeit, cProfile, pytest-benchmark | locust | locust sequences, pytest |
| Go | testing.B (built-in) | hey, vegeta | custom test harness |
| Rust | criterion | hey | custom scripts |
| Any CLI | hyperfine | hey, ab | shell scripts with timing |
Examples
Load test an API endpoint with before/after comparison:
/perf-test /api/orders
Sets up a load test using hey or autocannon against the endpoint, collects latency percentiles and throughput, then runs an improvement cycle to compare metrics before and after optimizations are applied.
Profile a slow parser function:
/perf-test parseCSV
Creates a benchmark harness for the function, runs it with multiple iterations to get statistically significant timing data, and identifies the hotspot causing the slowdown along with optimization recommendations.
Troubleshooting
Benchmark results are inconsistent across runs
Solution: Increase the number of iterations and add warmup runs to reduce cold-start variance. Close other resource-intensive processes and pin CPU frequency if possible to minimize OS-level noise.
Performance tool not available in project
Solution: Check the Tool Reference table above for alternatives in your language. If no project dependency exists, use language-built-in options (e.g., console.time for JS, timeit for Python, testing.B for Go) or install a standalone tool like hyperfine or hey that requires no project integration.
Notes
- Always run multiple iterations for statistical significance
- Include warmup runs to avoid cold-start bias
- For load tests, ensure the service is running locally or in a controlled environment
- For E2E scenarios, reset state between iterations for reproducible results
- Use isolated test databases, never production data
- Document baseline metrics before optimizing
- Compare against the same test configuration for valid before/after comparison
More from nielsmadan/agentic-coding
pdf
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
22resolve-conflicts
Resolve git conflicts from any operation (merge, rebase, cherry-pick, stash, revert). Use when encountering conflicted files during git operations.
11theme-factory
Apply professional visual themes to artifacts (presentations, documents, reports, HTML pages, landing pages). Use when user asks to "style this", "apply a theme", "make this look better", "beautify", or requests specific aesthetics like minimalist, modern, luxury, etc. Includes 10 preset themes and custom theme generation.
11frontend-design
Create distinctive, production-grade frontend interfaces with high design quality. Use when the user asks to build web components, pages, artifacts, posters, or applications (websites, landing pages, dashboards, React components, HTML/CSS layouts) or when styling/beautifying any web UI.
11optimize-seo
Audit and optimize web pages for SEO. Use when user asks to "optimize SEO", "check SEO", "add meta tags", "add structured data", "add schema markup", "improve search ranking", "SEO audit", or wants to generate Open Graph tags, JSON-LD, or fix SEO issues on HTML pages.
11rn-upgrade
React Native upgrade workflow. Use when upgrading React Native, bumping RN version, migrating to a new RN release, or resolving breaking changes after a React Native update.
10