Latency Principles
Latency Principles
Based on "Latency: Reduce delay in software systems" by Pekka Enberg (Manning, 2026).
This skill provides a systematic framework for minimizing delay in software systems, covering the entire stack from Physics and Hardware to Application Architecture and User Experience.
Foundational Concepts
- What is Latency?: The time delay between a cause and its observed effect.
- Latency vs. Bandwidth: You can always add more bandwidth (more links), but you are stuck with bad latency unless you optimize the path.
- Impact on User Experience (UX): < 100ms (immediate), < 1s (instant), > 10s (slow).
- Measuring Correctly: Use percentiles (p95, p99) to capture tail latency. Avoid averages. Avoid Coordinated Omission by using fixed-interval benchmarking.
Modeling Performance
Use these laws to establish theoretical bounds and size systems:
- Little's Law:
Concurrency (C) = Throughput (T) * Latency (L)- Usage: Calculate required concurrency to support a target throughput at a given latency.
- Example: If p99 latency is 50ms and you need 1000 RPS, you need
1000 * 0.05 = 50concurrent execution units.
- Amdahl's Law:
Speedup = 1 / ((1 - P) + (P / N))P: Portion of program that is parallelizable.N: Number of processors.- Usage: Understand the limits of parallelization. If 50% of your code is serial, the max speedup is 2x, regardless of how many cores you add.
Measurement & Visualization
Visualizing latency distributions is critical for identifying tail behavior:
- Histograms: Show frequency of samples. Good for seeing the "mode" (most common latency) and the spread.
- HDR Histograms: Plot latency against percentiles (p50, p99, etc.) on a log scale. Essential for p99.9+ analysis.
- eCDF (Empirical Cumulative Distribution Function): A smooth curve showing the probability that a request completes within a given time. Directly answers SLA compliance questions.
Tooling:
- Use
scripts/ping_collector.pyto gather data without Coordinated Omission. - Use
scripts/visualize_latency.pyto generate Histogram, HDR, and eCDF plots.
Quick Decision Guide
| Symptom | Probable Cause | Recommended Strategy |
|---|---|---|
| High Avg Latency | Sequential processing / Slow I/O | Concurrency (Async I/O) or Partitioning |
| High Tail Latency (p99) | Lock contention / GC / Neighbor noise | Wait-free Sync (Atomics) or Request Hedging |
| Network Slowness | Distance / Protocol overhead | Colocation (Edge) or Binary Serialization (Protobuf) |
| Database Load | Hot keys / Complex queries | Caching (Read-through) or Materialized Views |
| Slow Writes | ACID guarantees / Indexing | Write-Behind Caching or Sharding |
| High CPU Usage | O(n^2) logic / JSON parsing | Algorithmic Fixes or Protobuf/FlatBuffers |
| Micro-stutters | GC pauses / OS interrupts | Object Pooling or Interrupt Affinity |
| Lock Contention | Mutex bottleneck | Wait-free Sync (Atomics) |
| "It feels slow" | UI blocking on network | Optimistic Updates or Prefetching |
| Measurement Looks Too Good | Coordinated Omission | Fixed-Interval Benchmarking |
Part 1: Fundamentals (Start Here)
Core theory and diagnostic approaches.
- references/principles.md: Definitions of Little's Law, Amdahl's Law, Tail Latency, Compounding Latency, and Latency Distribution.
- references/diagnostic_checklist.md: Step-by-step checklist for identifying bottlenecks across Hardware, OS, and Runtime.
Part 2: Data Layer (Access Optimization)
Optimizing data access is often the highest-leverage activity for reducing latency.
- Colocation (Pattern: Move Compute to Data):
- Edge Computing: Use Near Edge (points of presence) or Far Edge (on-device/IoT) to eliminate geographical distance.
- Intranode: Colocate protocol handlers with application threads. Turn off Nagle's Algorithm (
TCP_NODELAY) to prevent packet batching delays. - Kernel-bypass: Use techniques like DPDK to eliminate OS stack overhead.
- Replication (Pattern: Trade Consistency for Latency):
- Leaderless/Multi-Leader: Allows local writes to reduce write latency, at the cost of complex conflict resolution.
- Consistency Models: Choose Eventual Consistency or Read-your-writes to avoid synchronous coordination (Strong Consistency) overhead.
- Partitioning (Pattern: Divide and Conquer):
- Horizontal Sharding: Splitting data to increase parallel throughput.
- Request Routing: Use Direct Routing (client-side) to avoid the extra hop of a proxy.
- Mitigate Hot Partitions: Use over-partitioning to balance skewed workloads.
- Caching (Pattern: Memory is Faster than Disk):
- Write-Behind Caching: Asynchronous writes to the data store to hide write latency.
- Policy Selection: Use SIEVE or LRU for eviction.
- Materialized Views: Precompute complex queries to eliminate runtime processing work.
See references/data_patterns.md for detailed implementation strategies.
Part 3: Compute Layer (Logic Acceleration)
Optimizing processing logic and synchronization to eliminate overhead.
- Eliminating Work (Pattern: The Fastest Code is Code that Doesn't Run):
- Algorithmic Complexity: Replace O(n) scans with O(log n) trees or O(1) hash maps.
- Zero-Copy Serialization: Use FlatBuffers instead of JSON to eliminate the parsing/unpacking step.
- Memory Tuning: Avoid dynamic allocation (
malloc/new) in hot paths. Use Object Pooling or Stack Allocation to prevent GC pauses or allocator lock contention. - Precomputation: Move work from runtime to build-time or startup.
- Wait-Free Sync (Pattern: Avoid Context Switches):
- Mutual Exclusion Problems: Locks (Mutexes) cause expensive OS context switches (~µs).
- Atomics: Use hardware primitives (
CAS,fetch_add) for lock-free state updates. - Wait-Free Structures: Implement Ring Buffers (SPSC) using memory barriers to allow threads to communicate without ever blocking.
- Exploiting Concurrency (Pattern: Use Every Core):
- Thread-per-core: Pin threads to physical cores to maximize cache locality and eliminate scheduler overhead.
- Concurrency Models: Use Coroutines/Fibers for lightweight userspace multitasking or Actor Model for shared-nothing message passing.
- SIMD: Use "Single Instruction, Multiple Data" to parallelize arithmetic at the hardware level.
See references/compute_optimization.md for detailed implementation patterns.
Part 4: Hiding Latency (Perceived Speed)
When you can't make it faster, make it feel faster by masking delays.
- Asynchronous Processing (Pattern: Don't Block the Main Thread):
- Request Hedging: Send the same request to multiple replicas and use the first response to cut tail latency.
- Request Batching: Group small requests to amortize round-trip and header overhead.
- Backpressure: Prevents queuing latency by signaling producers to slow down when the system is saturated.
- Predictive Techniques (Pattern: Guess the Future):
- Prefetching: Load data (Sequential, Spatial, or Semantic based on user intent) before it's explicitly requested.
- Optimistic Updates: Update the UI immediately assuming success; reconcile or rollback if the server fails.
- Speculative Execution (Pattern: Execute Before Needed):
- Parallel Speculation: Execute multiple possible outcomes in parallel and keep the correct one (e.g., in a search engine).
- Prewarming: Spin up resources (Lambdas, VM instances) based on historical traffic patterns before they are needed.
See references/hiding_latency.md for detailed masking strategies.
Bundled Resources
Scripts
Use these for diagnostics and quick calculations:
- scripts/diagnose_latency.py: Interactive diagnostic checklist runner
- scripts/latency_constants.py: Latency constants for ballpark calculations
Code Examples
See code-examples/ for implementations of key techniques from the book.
Latency Constants (Quick Reference)
| Operation | Time | Order |
|---|---|---|
| CPU cycle (3 GHz) | 0.3 ns | 10⁻¹ |
| L1 cache access | 1 ns | 10⁰ |
| DRAM access | 100 ns | 10² |
| NVMe disk access | 10 μs | 10⁴ |
| SSD disk access | 100 μs | 10⁵ |
| Network NYC → London | 60 ms | 10⁷ |
Human Perception
| Perception | Time |
|---|---|
| Immediate (no delay perceived) | < 100 ms |
| Instant (feels fast) | < 1 s |
| Slow | > 10 s |
More from ab22593k/skills
expressive-design
Material 3 Expressive design system for Flutter (Android & Linux). You MUST use this skill whenever the user asks to improve, modernize, or transform a Flutter UI, especially for consumer, media, or communication apps. It provides exact specs for Bento Grids, Masonry, color convergence, variable fonts, and expressive motion. Trigger this for any request involving 'modern' or 'emotionally engaging' Flutter widgets, even if they don't explicitly mention 'Expressive' design. Do NOT use for banking or safety-critical apps where standard M3 is required.
13widget-previewer
Use the Flutter Widget Previewer to preview widgets in isolation with real-time rendering in Chrome. Use when working with Flutter widget previews, @Preview annotations, preview configurations, or widget development workflows. Triggers include setting up widget previewer, adding @Preview annotations, customizing previews, creating custom preview annotations, handling multiple preview configurations, or troubleshooting widget preview issues.
3generative-thinker
Guides the agent to apply Generativity Theory to solve problems by building high-leverage, open-ended systems. Use this skill when the user asks for "out of the box" ideas, platform strategies, or when a task benefits from enabling others to innovate rather than providing a narrow, single-purpose fix.
3effective-testing
A comprehensive methodology for highly effective, human-centered software testing, based on 'Taking Testing Seriously' (Bach & Bolton, 2026). Use this skill whenever testing, quality assurance, bug hunting, test strategy, automation traps, or risk analysis are mentioned. MANDATORY for: (1) Designing test strategies for complex products, (2) Performing exploratory testing using Session-Based Test Management (SBTM), (3) Identifying bugs with advanced oracle heuristics (FEW HICCUPS), (4) Reporting bugs with high business significance, (5) Prospective testing on requirements/designs, or (6) Supervizing AI and signals-based testing. If the user asks 'Is this code good?', 'How should I test this?', or 'Help me find bugs', YOU MUST use this skill to provide a professional, context-driven investigation rather than shallow checks.
2kernel-maintainer
YOU MUST use this skill for ANY Rust Linux kernel work. This includes: writing kernel modules in Rust, reviewing Rust kernel patches, debugging kernel panics/crashes, working with kernel APIs (Mutex, SpinLock, UserPtr, IOCTL), adding module parameters, creating device drivers (character/block/network), handling kernel errors, memory allocation in kernel, and following kernel Rust coding standards. If the user mentions 'kernel' and 'Rust' together, USE THIS SKILL. Essential for kernel maintainers, patch contributors, and developers working with Rust For Linux (RFL).
1