skills/apankov1/quality-engineering/websocket-client-resilience

websocket-client-resilience

SKILL.md

WebSocket Client Resilience

6 resilience patterns for WebSocket clients, extracted from real-world mobile network conditions.

Mobile WebSocket connections fail in ways that local development environments don't surface. P99 latency on 4G networks is 5-8 seconds. A 5-second health check timeout causes false positives on every slow network.

When to use: Implementing WebSocket client reconnection logic, building real-time features with persistent connections, mobile app WebSocket handling, any client that maintains long-lived server connections.

When not to use: Server-side WebSocket handlers, HTTP request/response patterns, Server-Sent Events (SSE).

Rationalizations (Do Not Skip)

Rationalization Why It's Wrong Required Action
"Our users are on fast networks" Mobile users exist. Even desktop WiFi has transient blips. Test with throttled networks
"Simple retry is enough" Without jitter, all clients retry at once after an outage Add randomized jitter
"One missed heartbeat means disconnected" Network blips last 1-3 seconds. Single miss = false positive. Use hysteresis (2+ misses)
"We'll add resilience later" Reconnection logic is foundational. Retrofitting it is much harder. Build it in from the start
"5 seconds is plenty of timeout" Mobile P99 is 5-8s. That "timeout" is normal latency for mobile. Use 10s+ for mobile

Included Utilities

// WebSocket resilience pattern implementations (zero dependencies)
import {
  getBackoffDelay,
  circuitBreakerTransition,
  shouldDisconnect,
  CommandAckTracker,
  detectSequenceGap,
  classifyTimeout,
} from './resilience.ts';

getBackoffDelay() accepts an optional RNG function for deterministic tests:

const lowJitter = getBackoffDelay(0, 1000, 30000, () => 0); // 750
const highJitter = getBackoffDelay(0, 1000, 30000, () => 1); // 1250

Quick Reference

Pattern Detect Fix Severity
Backoff without jitter Math.pow(2, attempt) without Math.random() Add +/- 25% jitter must-fail
No circuit breaker Reconnect without failure counter Trip after 5 failures, 60s cooldown must-fail
Single heartbeat miss setTimeout disconnect without miss counter Require 2+ missed heartbeats should-fail
No command ack ws.send() without commandId tracking Track pending commands, timeout at 30s nice-to-have
No sequence tracking onmessage without sequence check Track lastReceivedSequence, detect gaps nice-to-have
Short mobile timeout Health timeout < 10s Use 10s+ for all health checks must-fail

Coverage

Pattern Utility Status
1. Backoff with jitter getBackoffDelay() Code + tests
2. Circuit breaker circuitBreakerTransition() Code + tests
3. Heartbeat hysteresis shouldDisconnect() Code + tests
4. Command acknowledgment CommandAckTracker Code + tests
5. Sequence gap detection detectSequenceGap() Code + tests
6. Mobile-aware timeouts classifyTimeout() Code + tests

All 6 patterns have executable utilities and tests.

Companion Skills

This skill provides client-side resilience patterns, not WebSocket server architecture guidance. For broader methodology:

  • Search websocket on skills.sh for server-side handlers, protocol design, and connection management
  • The circuit breaker is a state machine (closed/open/half-open) — use model-based-testing for systematic transition matrix coverage of all state pairs
  • Backoff, circuit breaker, and retry patterns need fault simulation — use fault-injection-testing for circuit breaker testing utilities and queue preservation assertions
  • Connection lifecycle events should log at correct levels — use observability-testing to assert structured log output on connect/disconnect/reconnect

Framework Adaptation

These patterns are framework-agnostic. They work with:

  • Browser: Native WebSocket, Socket.IO, ws library
  • React/Vue/Svelte: Wrap in composable/hook
  • React Native / Flutter: Same patterns, different APIs
  • Node.js: ws library for server-to-server WebSocket clients

The core principle: real-world network conditions are more variable than controlled environments. Design for mobile latency, not localhost.

See patterns.md for full before/after code examples and detection commands for each pattern.

Weekly Installs
17
GitHub Stars
5
First Seen
Feb 23, 2026
Installed on
gemini-cli17
opencode17
github-copilot17
codex17
kimi-cli17
claude-code17