skills/nickcrew/claude-ctx-plugin/python-performance-optimization

python-performance-optimization

SKILL.md

Python Performance Optimization

Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.

When to Use This Skill

  • Code runs too slowly for production requirements
  • High CPU usage or memory consumption issues
  • Need to reduce API response times or batch processing duration
  • Application fails to scale under load
  • Optimizing data processing pipelines or scientific computing
  • Reducing cloud infrastructure costs through efficiency gains
  • Profile-guided optimization after measuring performance bottlenecks

Core Concepts

The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.

Optimization Hierarchy (in priority order):

  1. Algorithm complexity - O(n²) → O(n log n) provides exponential gains
  2. Data structure choice - List → Set for lookups (10,000x faster)
  3. Language features - Comprehensions, built-ins, generators
  4. Caching - Memoization for repeated calculations
  5. Compiled extensions - NumPy, Numba, Cython for hot paths
  6. Parallelism - Multiprocessing for CPU-bound work

Key Principle: Algorithmic improvements beat micro-optimizations every time.

Quick Reference

Load detailed guides for specific optimization areas:

Task Load reference
Profile code and find bottlenecks skills/python-performance-optimization/references/profiling.md
Algorithm and data structure optimization skills/python-performance-optimization/references/algorithms.md
Memory optimization and generators skills/python-performance-optimization/references/memory.md
String concatenation and file I/O skills/python-performance-optimization/references/string-io.md
NumPy, Numba, Cython, multiprocessing skills/python-performance-optimization/references/acceleration.md

Optimization Workflow

Phase 1: Measure

  1. Profile with cProfile - Identify slow functions
  2. Line profile hot paths - Find exact slow lines
  3. Memory profile - Check for memory bottlenecks
  4. Benchmark baseline - Record current performance

Phase 2: Analyze

  1. Check algorithm complexity - Is it O(n²) or worse?
  2. Evaluate data structures - Are you using lists for lookups?
  3. Identify repeated work - Can results be cached?
  4. Find I/O bottlenecks - Database queries, file operations

Phase 3: Optimize

  1. Improve algorithms first - Biggest impact
  2. Use appropriate data structures - Set/dict for O(1) lookups
  3. Apply caching - @lru_cache for expensive functions
  4. Use generators - For large datasets
  5. Leverage NumPy/Numba - For numerical code
  6. Parallelize - Multiprocessing for CPU-bound tasks

Phase 4: Validate

  1. Re-profile - Verify improvements
  2. Benchmark - Measure speedup quantitatively
  3. Test correctness - Ensure optimizations didn't break functionality
  4. Document - Explain why optimization was needed

Common Optimization Patterns

Pattern 1: Replace List with Set for Lookups

# Slow: O(n) lookup
if item in large_list:  # Bad

# Fast: O(1) lookup
if item in large_set:   # Good

Pattern 2: Use Comprehensions

# Slower
result = []
for i in range(n):
    result.append(i * 2)

# Faster (35% speedup)
result = [i * 2 for i in range(n)]

Pattern 3: Cache Expensive Calculations

from functools import lru_cache

@lru_cache(maxsize=None)
def expensive_function(n):
    # Result cached automatically
    return complex_calculation(n)

Pattern 4: Use Generators for Large Data

# Memory inefficient
def read_file(path):
    return [line for line in open(path)]  # Loads entire file

# Memory efficient
def read_file(path):
    for line in open(path):  # Streams line by line
        yield line.strip()

Pattern 5: Vectorize with NumPy

# Pure Python: ~500ms
result = sum(i**2 for i in range(1000000))

# NumPy: ~5ms (100x faster)
import numpy as np
result = np.sum(np.arange(1000000)**2)

Common Mistakes to Avoid

  1. Optimizing before profiling - You'll optimize the wrong code
  2. Using lists for membership tests - Use sets/dicts instead
  3. String concatenation in loops - Use "".join() or StringIO
  4. Loading entire files into memory - Use generators
  5. N+1 database queries - Use JOINs or batch queries
  6. Ignoring built-in functions - They're C-optimized and fast
  7. Premature optimization - Focus on algorithmic improvements first
  8. Not benchmarking - Always measure improvements quantitatively

Decision Tree

Start here: Profile with cProfile to find bottlenecks

Hot path is algorithm?

  • Yes → Check complexity, improve algorithm, use better data structures
  • No → Continue

Hot path is computation?

  • Numerical loops → NumPy or Numba
  • CPU-bound → Multiprocessing
  • Already fast enough → Done

Hot path is memory?

  • Large data → Generators, streaming
  • Many objects → __slots__, object pooling
  • Caching needed → @lru_cache or custom cache

Hot path is I/O?

  • Database → Batch queries, indexes, connection pooling
  • Files → Buffering, streaming
  • Network → Async I/O, request batching

Best Practices

  1. Profile before optimizing - Measure to find real bottlenecks
  2. Optimize algorithms first - O(n²) → O(n) beats micro-optimizations
  3. Use appropriate data structures - Set/dict for lookups, not lists
  4. Leverage built-ins - C-implemented built-ins are faster than pure Python
  5. Avoid premature optimization - Optimize hot paths identified by profiling
  6. Use generators for large data - Reduce memory usage with lazy evaluation
  7. Batch operations - Minimize overhead from syscalls and network requests
  8. Cache expensive computations - Use @lru_cache or custom caching
  9. Consider NumPy/Numba - Vectorization and JIT for numerical code
  10. Parallelize CPU-bound work - Use multiprocessing to utilize all cores

Resources

Weekly Installs
33
GitHub Stars
12
First Seen
Jan 24, 2026
Installed on
opencode29
gemini-cli26
github-copilot25
codex25
cursor24
amp23