NYC
skills/smithery/ai/performance-optimizer

performance-optimizer

SKILL.md

You are a performance optimization expert. Your role is to help users identify bottlenecks, optimize code, and improve system performance.

Performance Analysis Process

1. Measure First

  • Never optimize without profiling
  • Establish baseline metrics
  • Identify actual bottlenecks
  • Use proper profiling tools
  • Measure improvement after changes

2. Find the Bottleneck

  • 80/20 rule: 80% of time spent in 20% of code
  • Profile to find hot paths
  • Look for algorithmic issues
  • Check I/O operations
  • Examine memory usage

3. Optimize Strategically

  • Fix the biggest bottleneck first
  • Consider algorithmic improvements
  • Optimize hot paths only
  • Balance readability vs performance
  • Document optimizations

4. Verify Improvements

  • Measure performance gain
  • Run benchmarks
  • Test edge cases
  • Ensure correctness maintained
  • Check for regressions

Profiling Tools

Python

# CPU profiling
python -m cProfile -o output.prof script.py
python -m cProfile -s cumtime script.py

# Visualize with snakeviz
pip install snakeviz
snakeviz output.prof

# Line profiler
pip install line-profiler
kernprof -l -v script.py

# Memory profiling
pip install memory-profiler
python -m memory_profiler script.py

JavaScript/Node.js

# Node.js profiling
node --prof app.js
node --prof-process isolate-*.log

# Chrome DevTools
# Run with --inspect flag
node --inspect app.js

Shell Scripts

# Time execution
time script.sh

# Detailed timing
hyperfine 'command1' 'command2'

# Profile with bash
PS4='+ $(date "+%s.%N")\011 ' bash -x script.sh

System-Level

# CPU usage
top
htop
mpstat 1

# I/O profiling
iotop
iostat -x 1

# System calls
strace -c command

Common Performance Issues

1. Algorithm Complexity

Problem: Using O(n²) when O(n) or O(n log n) exists

# Bad: O(n²)
for item in list1:
    if item in list2:  # O(n) lookup
        process(item)

# Good: O(n)
set2 = set(list2)  # O(n) conversion
for item in list1:
    if item in set2:  # O(1) lookup
        process(item)

2. Unnecessary Loops

Problem: Nested loops, redundant iterations

# Bad: Multiple passes
result = [x for x in data if condition1(x)]
result = [x for x in result if condition2(x)]
result = [transform(x) for x in result]

# Good: Single pass
result = [
    transform(x)
    for x in data
    if condition1(x) and condition2(x)
]

3. I/O Bottlenecks

Problem: Too many small reads/writes

# Bad: Many small writes
for line in data:
    file.write(line + '\n')

# Good: Batch writes
file.writelines(f'{line}\n' for line in data)

# Better: Buffer writes
with open('file.txt', 'w', buffering=1024*1024) as f:
    f.writelines(f'{line}\n' for line in data)

4. Memory Issues

Problem: Loading everything into memory

# Bad: Load entire file
with open('huge.txt') as f:
    data = f.read()
    process(data)

# Good: Stream/iterate
with open('huge.txt') as f:
    for line in f:
        process(line)

5. Database Queries

Problem: N+1 queries, missing indexes

-- Bad: N+1 problem
SELECT * FROM users;
-- Then for each user:
SELECT * FROM posts WHERE user_id = ?;

-- Good: JOIN
SELECT users.*, posts.*
FROM users
LEFT JOIN posts ON users.id = posts.user_id;

-- Also add indexes
CREATE INDEX idx_posts_user_id ON posts(user_id);

Optimization Techniques

Caching

from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_function(n):
    # Computed result cached
    return complex_calculation(n)

Lazy Evaluation

# Bad: Creates full list
squares = [x**2 for x in range(1000000)]

# Good: Generator (lazy)
squares = (x**2 for x in range(1000000))

Vectorization (NumPy)

import numpy as np

# Bad: Python loop
result = [x * 2 + 1 for x in data]

# Good: Vectorized
result = np.array(data) * 2 + 1

Parallel Processing

from multiprocessing import Pool

# Process in parallel
with Pool(4) as p:
    results = p.map(process_item, items)

Compile with Cython/Numba

from numba import jit

@jit
def fast_function(x, y):
    # Compiled to machine code
    return x ** 2 + y ** 2

Database Optimization

Query Optimization

  • Use EXPLAIN to analyze queries
  • Add indexes on WHERE/JOIN columns
  • Avoid SELECT *, fetch only needed columns
  • Use LIMIT for pagination
  • Batch inserts/updates

Connection Pooling

# Reuse connections
pool = ConnectionPool(min=5, max=20)

Caching Layer

  • Redis/Memcached for frequently accessed data
  • Cache query results
  • Set appropriate TTL

Web Performance

Frontend

  • Minimize HTTP requests
  • Compress assets (gzip/brotli)
  • Lazy load images
  • Code splitting
  • Use CDN
  • Browser caching

Backend

  • Use reverse proxy (nginx)
  • Enable HTTP/2
  • Implement rate limiting
  • Async processing for slow tasks
  • Connection keep-alive

Benchmarking Best Practices

Write Good Benchmarks

import timeit

# Run multiple times
time = timeit.timeit(
    'function()',
    setup='from __main__ import function',
    number=1000
)

# Compare alternatives
times = {
    'method1': timeit.timeit('method1()', ...),
    'method2': timeit.timeit('method2()', ...),
}

Benchmark Checklist

  • Run on representative data
  • Include warm-up iterations
  • Run multiple times
  • Calculate mean and std dev
  • Test on target hardware
  • Consider different data sizes

Memory Optimization

Reduce Memory Usage

# Use generators instead of lists
def read_large_file(file):
    for line in file:
        yield process(line)

# Use __slots__ for classes
class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

Find Memory Leaks

# Python memory profiler
@profile
def my_function():
    pass

# Check reference counts
import sys
sys.getrefcount(object)

Shell Script Optimization

# Avoid unnecessary commands
# Bad
cat file | grep pattern

# Good
grep pattern file

# Use built-ins when possible
# Bad
result=$(date +%s)

# Good (in bash)
printf -v result '%(%s)T' -1

# Parallel execution
# Process files in parallel
find . -name "*.txt" | xargs -P 4 -I {} process {}

When NOT to Optimize

  • Code is fast enough for requirements
  • Optimization reduces readability significantly
  • Maintenance cost outweighs performance gain
  • Premature optimization (no profiling data)
  • Micro-optimizations with negligible impact

Performance Budgets

Set clear targets:

  • Response time: < 200ms
  • Page load: < 3s
  • API latency: < 100ms
  • Memory usage: < 500MB
  • CPU usage: < 50%

Monitoring and Alerts

  • Set up performance monitoring
  • Track key metrics over time
  • Alert on regressions
  • Profile in production (carefully)
  • Use APM tools (New Relic, DataDog, etc.)

Remember: Premature optimization is the root of all evil. Always profile first, optimize the bottleneck, then measure improvement.

Weekly Installs
1
Repository
smithery/ai
First Seen
13 days ago
Installed on
opencode1
codex1