MCP Health Checker

What it does

MCP (Model Context Protocol) servers are how OpenClaw connects to external tools — but connections go stale silently. A crashed MCP server doesn't throw an error until the agent tries to use it, causing confusing mid-task failures.

MCP Health Checker proactively monitors all configured MCP connections. It pings servers, measures latency, tracks uptime history, and alerts you before a stale connection causes a problem.

Inspired by OpenLobster's MCP connection health monitoring and OAuth 2.1+PKCE token refresh tracking.

When to invoke

Automatically every 6 hours (cron) — silent background health check
Manually before starting a task that depends on MCP tools
When an MCP tool call fails unexpectedly — diagnose the connection
After restarting MCP servers — verify all connections restored

Health checks performed

Check	What it tests	Severity on failure
REACHABLE	Server responds to connection probe	CRITICAL
LATENCY	Response time under threshold (default: 5s)	HIGH
STALE	Connection age exceeds max (default: 24h)	HIGH
TOOL_COUNT	Server exposes expected number of tools	MEDIUM
CONFIG_VALID	MCP config entry has required fields	MEDIUM
AUTH_EXPIRY	OAuth/API token approaching expiration	HIGH

How to use

python3 check.py --ping                     # Ping all configured MCP servers
python3 check.py --ping --server <name>     # Ping a specific server
python3 check.py --ping --timeout 3         # Custom timeout in seconds
python3 check.py --status                   # Last check summary from state
python3 check.py --history                  # Show past check results
python3 check.py --config                   # Validate MCP config entries
python3 check.py --format json              # Machine-readable output

Cron wakeup behaviour

Every 6 hours:

Read MCP server configuration from ~/.openclaw/config/ (YAML/JSON)
For each configured server:
- Attempt connection probe (TCP or HTTP depending on transport)
- Measure response latency
- Check connection age against staleness threshold
- Verify tool listing matches expected count (if tracked)
- Check auth token expiry (if applicable)
Update state with per-server health records
Print summary: healthy / degraded / unreachable counts
Exit 1 if any CRITICAL findings

Procedure

Step 1 — Run a health check

python3 check.py --ping

Review the output. Healthy servers show a green check. Degraded servers show latency warnings. Unreachable servers show a critical alert.

Step 2 — Diagnose a specific server

python3 check.py --ping --server filesystem

Detailed output for a single server: latency, last seen, tool count, auth status.

Step 3 — Validate configuration

python3 check.py --config

Checks that all MCP config entries have the required fields (command, args or url depending on transport type).

Step 4 — Review history

python3 check.py --history

Shows uptime trends over the last 20 checks. Spot servers that are intermittently failing.

State

Per-server health records and check history stored in ~/.openclaw/skill-state/mcp-health-checker/state.yaml.

Fields: last_check_at, servers list, check_history.

Notes

Does not modify MCP configuration — read-only monitoring
Connection probes use the same transport as the MCP server (stdio subprocess spawn or HTTP GET)
For stdio servers: probes verify the process can start and respond to initialize
For HTTP/SSE servers: probes send a health-check HTTP request
Latency threshold configurable via --timeout (default: 5s)
Staleness threshold configurable via --max-age (default: 24h)

mcp-health-checker

MCP Health Checker

What it does

When to invoke

Health checks performed

How to use

Cron wakeup behaviour

Procedure

State

Notes

More from archieindian/openclaw-superpowers

context-window-management

heartbeat-governor

using-superpowers

long-running-task-management

fact-check-before-trust

agent-self-recovery