shadow-mode
SKILL.md
Shadow Mode Migration Pattern
Shadow mode mirrors production traffic to a new system without affecting users. The shadow system's responses are discarded — only the production response reaches the user — but both responses are logged and compared to validate correctness.
When to Use This Skill
| Use this skill when... | Use dual-write instead when... |
|---|---|
| Validating read behavior of a replacement service | Both systems need to persist writes |
| Testing performance under real production load | You need the new store to be authoritative |
| Comparing response correctness before cutover | Migrating data stores that must stay in sync |
| Evaluating a new service version safely | The new system needs to receive and store mutations |
| Load testing a new deployment with real traffic | You need strong consistency between systems |
Core Concepts
Traffic Flow
Client Request
│
▼
┌─────────────┐
│ Router / │
│ Proxy │
├──────┬──────┤
│ │ │
▼ │ ▼
Prod │ Shadow
System │ System
│ │ │
▼ │ ▼
Prod │ Shadow
Response│ Response
│ │ │
▼ │ (discard)
Client │ │
│ ▼
│ Compare &
│ Log
▼
Shadow Modes
| Mode | Description | Use case |
|---|---|---|
| Full mirror | 100% of traffic duplicated | Final validation before cutover |
| Sampled mirror | Percentage of traffic (e.g., 10%) | Early validation, capacity-constrained shadow |
| Selective mirror | Specific request types or endpoints | Targeted validation of changed behavior |
| Replay mirror | Recorded traffic replayed offline | Testing without live shadow infrastructure |
Implementation Architecture
Key Components
| Component | Responsibility |
|---|---|
| Traffic splitter | Duplicates requests to shadow system |
| Shadow router | Forwards mirrored requests, manages timeouts |
| Response comparator | Compares prod vs shadow responses |
| Discrepancy logger | Records differences with full context |
| Metrics collector | Tracks match rates, latency, error rates |
| Kill switch | Disables shadow traffic instantly if issues arise |
Deployment Topology
| Topology | How it works | Trade-offs |
|---|---|---|
| Proxy-based | Load balancer or API gateway mirrors requests | Simple setup, adds proxy hop |
| Application-level | Application code sends async copy of request | Fine-grained control, code coupling |
| Infrastructure-level | Service mesh (Istio, Linkerd) mirrors traffic | No code changes, requires mesh |
| Log replay | Capture request logs, replay against shadow | No live infrastructure needed, not real-time |
Implementation Patterns
Proxy-Based Mirroring
Configure the load balancer or API gateway to:
- Forward the original request to the production backend
- Clone the request and send it to the shadow backend
- Return only the production response to the client
- Shadow response is logged but never returned
- Shadow request timeout is independent of production
Application-Level Mirroring
- Intercept the incoming request at the application layer
- Process the request normally through the production path
- Asynchronously send a copy of the request to the shadow service
- Do not block the production response on the shadow response
- Compare responses in a background worker
Response Comparison Strategy
Compare responses field by field with configurable rules:
| Field type | Comparison approach |
|---|---|
| IDs, timestamps | Ignore (expected to differ) |
| Computed values | Compare within tolerance (e.g., floating point) |
| Collections | Compare as sets (ignore ordering unless significant) |
| Status codes | Exact match required |
| Error responses | Categorize and compare error types |
| Headers | Compare relevant headers only (Content-Type, Cache-Control) |
Handling Stateful Requests
Shadow mode works best with read-only requests. For stateful (write) requests:
| Approach | Description |
|---|---|
| Skip writes | Only mirror read requests to shadow |
| Isolated state | Shadow has its own database seeded from production |
| Dry-run writes | Shadow validates the write but does not persist |
| Record-only | Log what shadow would have written, compare intent |
Gradual Rollout
| Phase | Traffic % | Duration | Goal |
|---|---|---|---|
| 1. Smoke test | 1% | Hours | Verify shadow receives and processes requests |
| 2. Canary | 5-10% | Days | Identify obvious discrepancies |
| 3. Validation | 25-50% | Days-weeks | Build confidence in match rate |
| 4. Full mirror | 100% | Days-weeks | Final validation before cutover |
Validation Metrics
| Metric | Target | Description |
|---|---|---|
| Response match rate | > 99.9% | Percentage of identical responses |
| Shadow latency (P50) | Within 2x of prod | Shadow performance baseline |
| Shadow latency (P99) | Monitored | Tail latency under real load |
| Shadow error rate | < prod error rate | Shadow should not produce more errors |
| Shadow availability | Monitored | Shadow uptime (not a blocker) |
| Discrepancy categories | Trending to zero | Known differences resolved over time |
Common Pitfalls
| Pitfall | Mitigation |
|---|---|
| Shadow affects production performance | Async mirroring, independent timeouts, kill switch |
| Shadow writes to shared resources | Isolate shadow databases, queues, and external services |
| Non-deterministic responses cause false mismatches | Configure comparison rules to ignore timestamps, IDs, nonces |
| Shadow receives stale data | Seed shadow database from recent production snapshot |
| Traffic amplification overwhelms shadow | Use sampled mirroring, auto-scaling, or circuit breakers |
| Request ordering differs between prod and shadow | Compare request-by-request, not sequence-dependent |
| Authentication tokens expire for shadow | Mint shadow-specific tokens or bypass auth in shadow |
Integration with Dual Write
Shadow mode and dual write are complementary migration techniques:
| Migration phase | Technique | Purpose |
|---|---|---|
| Early validation | Shadow mode (reads) | Verify the new system returns correct responses |
| Data sync | Dual write | Keep both stores authoritative during transition |
| Pre-cutover | Both simultaneously | Shadow validates reads, dual write maintains data |
| Cutover | Dual write reversal | New system becomes primary, old becomes secondary |
| Post-cutover | Shadow mode (reversed) | Mirror to old system to verify nothing broke |
Strangler Fig Context
Both patterns are tactics within the broader Strangler Fig migration strategy:
- Identify a component to migrate
- Shadow traffic to validate the replacement
- Dual write to synchronize data stores
- Cut over reads, then writes
- Decommission the old component
- Repeat for the next component
Kill Switch Requirements
Shadow mode must have an immediate disable mechanism:
- Feature flag or configuration toggle (no deployment required)
- Disables within seconds, not minutes
- Monitored — alerts if shadow causes production impact
- Tested before enabling shadow traffic
Monitoring Checklist
- Production latency impact (should be zero or negligible)
- Shadow request success rate
- Shadow response latency distribution
- Response match rate by endpoint
- Discrepancy log volume and categories
- Shadow system resource utilization
- Kill switch status and responsiveness
Agentic Optimizations
| Context | Approach |
|---|---|
| Architecture review | Verify shadow isolation (no shared writes), kill switch exists |
| Code review | Check async mirroring does not block production path |
| Implementation | Start with proxy-based mirroring at 1%, increase gradually |
| Testing | Verify kill switch works, confirm production is unaffected when shadow fails |
Quick Reference
| Term | Definition |
|---|---|
| Shadow system | The new system receiving mirrored traffic |
| Production system | The live system serving real users |
| Traffic splitter | Component that duplicates requests |
| Match rate | Percentage of shadow responses matching production |
| Kill switch | Mechanism to instantly disable shadow traffic |
| Dark launching | Synonym for shadow mode — feature is live but invisible to users |
| Canary traffic | Small percentage of mirrored requests for initial validation |
| Strangler fig | Broader migration strategy of incrementally replacing components |
Weekly Installs
25
Repository
laurigates/clau…-pluginsGitHub Stars
13
First Seen
Feb 27, 2026
Security Audits
Installed on
opencode25
gemini-cli25
github-copilot25
codex25
amp25
cline25