websocket
SKILL.md
WebSocket Skill
Production-grade WebSocket patterns for real-time communication in Python applications.
When to Use This Skill
Use this skill when:
- Building real-time features (chat, notifications, live updates)
- Implementing bidirectional client-server communication
- Creating collaborative features (multi-user editing, presence)
- Building dashboards with live data feeds
- Implementing game servers or real-time applications
Core Mental Model
A WebSocket connection is fundamentally different from HTTP:
HTTP: Request → Response → Done
WebSocket: Connect → Accept → [Receive Loop] → Disconnect → Cleanup
The receive loop is unavoidable because:
- The connection is long-lived - something must continuously listen
- Messages arrive asynchronously from the client
- The loop defines the connection's lifetime boundary
- Even when frameworks abstract it, a loop exists underneath
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ WebSocket System │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────────────────────┐ │
│ │ Client 1 │───▶│ │ │
│ └──────────────┘ │ │ │
│ │ Connection Manager │ │
│ ┌──────────────┐ │ ┌─────────────────────────────┐ │ │
│ │ Client 2 │───▶│ │ connections: Set[WebSocket] │ │ │
│ └──────────────┘ │ │ rooms: Dict[str, Set[WS]] │ │ │
│ │ └─────────────────────────────┘ │ │
│ ┌──────────────┐ │ │ │
│ │ Client N │───▶│ connect() → track │ │
│ └──────────────┘ │ disconnect() → remove │ │
│ │ broadcast() → fan-out │ │
│ │ send_to_room() → targeted │ │
│ └──────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
File Structure
app/
├── websocket/
│ ├── __init__.py
│ ├── manager.py # ConnectionManager class
│ ├── handlers.py # Message handlers by type
│ ├── models.py # Message schemas (Pydantic)
│ └── router.py # WebSocket endpoint
└── main.py # Mount WebSocket router
Core Principles
1. Explicit Connection Lifecycle
Connections must be:
- Accepted explicitly - Never assume connection success
- Tracked centrally - Single source of truth
- Removed on disconnect - Eager cleanup prevents leaks
- Silent failures are worse than explicit disconnects
2. Centralized Connection Management
# Connection state lives in ONE place
class ConnectionManager:
def __init__(self):
self.active_connections: set[WebSocket] = set()
async def connect(self, websocket: WebSocket) -> None:
await websocket.accept()
self.active_connections.add(websocket)
def disconnect(self, websocket: WebSocket) -> None:
self.active_connections.discard(websocket)
Business logic must NEVER handle socket tracking directly.
3. Receive Loop Responsibility
The receive loop:
- Defines connection lifetime
- Is the only entry point for client messages
- Must handle disconnection gracefully
async def websocket_endpoint(websocket: WebSocket):
await manager.connect(websocket)
try:
while True: # THE LOOP
data = await websocket.receive_text()
# validate → dispatch → respond
except WebSocketDisconnect:
pass # Expected, not exceptional
finally:
manager.disconnect(websocket) # Always cleanup
4. Message Contract Discipline
Every message must be:
- Structured: JSON, not raw strings
- Typed:
{"type": "chat", "payload": {...}} - Intent-declaring: The
typefield routes to handlers
class WebSocketMessage(BaseModel):
type: str
payload: dict = {}
5. Defensive Failure Handling
- Disconnections happen constantly (normal, not exceptional)
- Writes can fail even after reads succeed
- Dead connections must be cleaned eagerly
- One failure must not cascade to others
Extended Concepts
Rooms / Channels
A room is a logical routing layer, not a socket feature:
class ConnectionManager:
def __init__(self):
self.connections: set[WebSocket] = set()
self.rooms: dict[str, set[WebSocket]] = defaultdict(set)
def join_room(self, websocket: WebSocket, room_id: str) -> None:
self.rooms[room_id].add(websocket)
def leave_room(self, websocket: WebSocket, room_id: str) -> None:
self.rooms[room_id].discard(websocket)
async def broadcast_to_room(self, room_id: str, message: str) -> None:
for connection in self.rooms[room_id]:
await connection.send_text(message)
Key insights:
- Connection belongs to exactly 1 manager, but 0+ rooms
- Room membership is explicit (join/leave operations)
- Leaving a room ≠ disconnecting (orthogonal concerns)
Performance Mental Model
Performance is about control, not speed:
| Cost Center | Why It Matters |
|---|---|
| Fan-out O(n) | Broadcast to 1000 clients = 1000 sends |
| Serialization | JSON encoding per message adds up |
| Slow clients | One blocked send() can stall others |
Safety principles:
- Never block the receive loop with slow operations
- Writes must not delay reads
- One slow client must not affect others
Generation Order
When implementing WebSocket features:
- Define message types - What messages will be exchanged?
- Create ConnectionManager - Central state management
- Implement receive loop - WebSocket endpoint
- Add message handlers - Route by message type
- Add rooms (if needed) - Logical grouping
- Add error handling - Defensive cleanup
Quick Reference
# Accept connection
await websocket.accept()
# Receive message
data = await websocket.receive_text()
data = await websocket.receive_json()
# Send message
await websocket.send_text(message)
await websocket.send_json(data)
# Close connection
await websocket.close(code=1000)
# Handle disconnection
from starlette.websockets import WebSocketDisconnect
What "Production-Grade" Means
Not:
- Feature-rich
- Horizontally scalable (yet)
- Prematurely optimized
Yes:
- Predictable behavior under all conditions
- Clear separation: connection management | business logic | routing
- Safe defaults (cleanup on failure, structured messages)
- Clear upgrade paths to rooms, scaling, tuning
Explicit Non-Goals
These are intentionally excluded to keep the core pattern clean:
- Authentication / authorization (handle before WebSocket upgrade)
- Persistence (message history is a separate concern)
- Distributed state (requires Redis/pubsub, out of scope)
- Message ordering guarantees across processes
References
See the references/ directory for:
connection-manager-pattern.md- Centralized connection trackingreceive-loop-pattern.md- Message loop structure and lifetimemessage-protocol-pattern.md- JSON message contractsrooms-pattern.md- Logical routing and channelserror-handling-pattern.md- Defensive failure handling