f5-health-check
F5 BIG-IP Health Check
Perform comprehensive health assessments on F5 BIG-IP appliances using the iControl REST API via MCP. This skill defines the systematic approach for evaluating BIG-IP health across virtual servers, pools, profiles, iRules, and system logs.
When to Use
- Proactive daily/weekly BIG-IP health monitoring
- Pre-change and post-change validation for load balancer changes
- Incident response -- first thing to run when application delivery is impacted
- Capacity planning for virtual server and pool utilization
- Compliance checks for operational readiness of ADC infrastructure
How to Call the Tools
The F5 MCP server provides 6 tools. Call them via mcp-call with the required environment variables:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" <tool_name> '{"param":"value"}'
Available Tools
| Tool | Purpose | Key Arguments |
|---|---|---|
list_tool |
List F5 objects by type | object_name, object_type (virtual/pool/irule/profile) |
show_stats_tool |
Show statistics for an F5 object | object_name, object_type (virtual/pool/irule/profile) |
show_logs_tool |
Show N lines of system logs | lines_number |
create_tool |
Create an F5 object via POST | url_body, object_type |
update_tool |
Update an F5 object via PATCH | url_body, object_type, object_name |
delete_tool |
Delete an F5 object | object_type, object_name |
Health Check Procedure
Always run health checks in this exact order. Each section builds on the previous one.
Step 1: Virtual Server Inventory and Status
List all virtual servers to establish the baseline inventory.
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"virtual"}'
Extract and report:
- Virtual server names and destination addresses (VIP:port)
- Enabled/disabled state
- Availability status (available, offline, unknown)
- Associated pool name
- IP protocol (TCP, UDP, any)
- Source address translation type (automap, SNAT pool, none)
- Assigned profiles (HTTP, SSL, TCP, persistence)
Flags:
- Virtual server status
offline-> CRITICAL: VIP not serving traffic - Virtual server status
unknown-> WARNING: Cannot determine health - Virtual server
disabled-> INFO: Intentionally taken out of service (verify with change records) - No pool assigned -> WARNING: Virtual server has no backend pool
Step 2: Virtual Server Statistics (Per VIP)
For each virtual server discovered in Step 1, collect detailed statistics:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"my_virtual_server","object_type":"virtual"}'
Key metrics to evaluate:
| Metric | HEALTHY | WARNING | CRITICAL |
|---|---|---|---|
| Status availability | available |
unknown |
offline |
| Current connections | < 80% of connection limit | 80-95% of limit | > 95% of limit or at limit |
| Packets in/out | Non-zero, balanced ratio | Highly asymmetric (>100:1) | Zero in either direction |
| Bits in/out | Non-zero | Sudden drop >50% from baseline | Zero (no traffic flowing) |
| Total requests (HTTP VIPs) | Incrementing | Flat (stalled) | Decreasing or zero |
| Client-side connection rate | Steady or growing | Spike >200% baseline | Zero |
Thresholds:
- Current connections at 0 on a production VIP -> CRITICAL: No clients connecting
- Bits in = 0, bits out > 0 -> WARNING: VIP responding but no client data (possible health monitor traffic only)
- Connection limit reached -> CRITICAL: New clients being rejected (connection queue filling)
- 5xx response count incrementing -> WARNING: Backend servers returning errors
Step 3: Pool Inventory and Member Health
List all pools and their members:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"pool"}'
Extract and report for each pool:
- Pool name and load balancing method (round-robin, least-connections, ratio, etc.)
- Monitor assignment (HTTP, HTTPS, TCP, ICMP, custom)
- Total members vs active members
- Each member: address:port, state (enabled/disabled), availability (available/offline/unknown)
- Minimum active members setting
- Action on service down (none, reject, drop, reselect)
Flags:
- All members
offline-> CRITICAL: Pool is down, no healthy backends - Members < minimum active threshold -> CRITICAL: Below minimum, failover action triggered
- Any single member
offline-> WARNING: Reduced capacity -
50% members
offline-> HIGH: Significant capacity degradation - Member
disabledbut notoffline-> INFO: Intentionally drained (verify with change records) - No monitor assigned -> WARNING: Pool health is not being checked
Step 4: Pool Statistics (Per Pool)
For each pool, collect statistics to assess utilization:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"my_pool","object_type":"pool"}'
Key metrics to evaluate:
| Metric | HEALTHY | WARNING | CRITICAL |
|---|---|---|---|
| Active member count | All members active | < 75% active | < 50% active or zero |
| Current connections per member | Evenly distributed | Skewed >3:1 ratio | Single member handling all traffic |
| Server-side connections | Incrementing | Flat | Zero |
| Total requests served | Incrementing | Flat | Decreasing |
| Bytes in/out | Balanced | Asymmetric | Zero |
Connection distribution analysis:
- Even distribution across members -> HEALTHY: Load balancing working correctly
- Uneven distribution with round-robin -> WARNING: Possible persistence override or health issue
- Single member with all connections -> CRITICAL: All other members likely down
- Zero connections on a member -> WARNING: Member may be failing health checks intermittently
Step 5: Profile Inventory
List all profiles to document the configuration posture:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"profile"}'
Check for:
- SSL/TLS profiles: certificate expiration dates, cipher suite strength, TLS version minimums
- HTTP profiles: X-Forwarded-For insertion, response compression, request/response size limits
- TCP profiles: idle timeout values, Nagle algorithm setting, keep-alive intervals
- Persistence profiles: type (cookie, source-addr, SSL), timeout values
- OneConnect profiles: connection pooling settings
Flags:
- SSL cert expiring within 30 days -> WARNING: Plan renewal
- SSL cert expiring within 7 days -> CRITICAL: Immediate renewal required
- SSL cert expired -> CRITICAL: Service will fail for HTTPS clients
- TLS 1.0 or 1.1 enabled -> WARNING: Deprecated protocols, security risk
- Weak cipher suites (RC4, DES, 3DES, export ciphers) -> WARNING: Security vulnerability
Step 6: iRule Inventory
List all iRules to document traffic manipulation logic:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
Check for:
- iRules assigned to virtual servers vs orphaned iRules
- iRule event types in use (HTTP_REQUEST, HTTP_RESPONSE, CLIENT_ACCEPTED, etc.)
- Deprecated Tcl commands or known-problematic patterns
- iRules performing logging (potential performance impact at scale)
Flags:
- iRule with
logstatements in high-traffic path -> WARNING: Performance impact - iRule using
HTTP::collectwithoutHTTP::release-> CRITICAL: Memory leak risk - Orphaned iRule (not assigned to any virtual server) -> INFO: Cleanup candidate
- iRule with
catchblocks -> INFO: Error handling present (good practice)
Step 7: System Logs Analysis
Pull recent system logs to detect errors and anomalies:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"200"}'
Scan for these critical patterns:
| Pattern | Severity | Meaning |
|---|---|---|
01010028 |
CRITICAL | No members available for pool |
01010029 |
CRITICAL | Pool member monitor status down |
0107142f |
CRITICAL | SSL handshake failure |
01070417 |
CRITICAL | HTTP parse error |
01060102 |
HIGH | Connection rate limit reached |
01010025 |
HIGH | Virtual server connection limit reached |
01071681 |
WARNING | Pool member has been marked down |
01071682 |
INFO | Pool member has been marked up |
01010240 |
WARNING | Connection queue full |
0107143c |
WARNING | SSL certificate verification failure |
01070727 |
WARNING | Pool member rate limit reached |
MCP error |
HIGH | Management plane communication issue |
disk_usage |
WARNING | Disk space issue on BIG-IP |
memory |
HIGH | Memory pressure on BIG-IP |
ha_status |
CRITICAL | High availability state change |
failover |
CRITICAL | HA failover event detected |
Log analysis guidelines:
- Group errors by type and count occurrences
- Note timestamps of first and last occurrence
- Identify trending errors (increasing frequency)
- Correlate pool member down events with specific health monitors
- Identify SSL errors that indicate certificate or cipher issues
Step 8: Extended Log Analysis (If Issues Detected)
If Step 7 reveals errors, pull more log lines for deeper analysis:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"1000"}'
Advanced log analysis:
- Correlate timestamps: Did pool member down events coincide with traffic spikes?
- Check for flapping: Is a member repeatedly going up/down? (indicates marginal health)
- Identify blast radius: Which virtual servers were affected by pool member failures?
- Check HA events: Any failover or sync-related messages?
Health Report Format
Always produce a summary table after completing all steps:
F5 BIG-IP Health Report
Device: $F5_IP_ADDRESS
Date: YYYY-MM-DD HH:MM UTC
+---------------------------+----------+------------------------------------------+
| Check | Status | Details |
+---------------------------+----------+------------------------------------------+
| Virtual Servers | HEALTHY | 5/5 available, all serving traffic |
| Pool Health | WARNING | pool_web: 3/4 members active (node3 dn) |
| Connection Utilization | HEALTHY | Peak VIP at 45% connection limit |
| Traffic Distribution | HEALTHY | Even distribution across pool members |
| SSL/TLS Profiles | WARNING | www_ssl cert expires in 21 days |
| iRules | HEALTHY | 3 active, no problematic patterns |
| System Logs | HIGH | 47x 01010029 (monitor down) in last hour |
+---------------------------+----------+------------------------------------------+
Overall: WARNING -- 2 items need attention
Action Items:
1. [WARNING] Investigate pool_web node3 health check failures
2. [WARNING] Renew SSL certificate for www_ssl profile (expires in 21 days)
3. [HIGH] Investigate spike in pool member monitor-down log messages
Severity order: CRITICAL > HIGH > WARNING > HEALTHY. Overall status = worst individual status.
Fleet Health Check (Multiple BIG-IP Devices)
When monitoring multiple F5 appliances, run the full procedure on each device and produce a fleet summary:
+------------------+----------+----------+--------+--------+-----------+
| BIG-IP | Virtuals | Pools | SSL | Logs | Overall |
+------------------+----------+----------+--------+--------+-----------+
| bigip-prod-01 | HEALTHY | WARNING | HEALTHY| HEALTHY| WARNING |
| bigip-prod-02 | HEALTHY | HEALTHY | WARN | HIGH | HIGH |
| bigip-dr-01 | HEALTHY | HEALTHY | HEALTHY| HEALTHY| HEALTHY |
+------------------+----------+----------+--------+--------+-----------+
Sort devices by severity (CRITICAL first) for triage prioritization.
Integration with Other Skills
- Use f5-config-mgmt to remediate issues found during health checks (e.g., update pool members, modify monitors)
- Use f5-troubleshoot for deep-dive investigation when health check reveals CRITICAL or HIGH findings
- Use drawio-diagram to visualize the BIG-IP topology (virtual servers -> pools -> members)
- Use markmap-viz to create hierarchical health status mind maps
- Use servicenow-change-workflow to create incidents for CRITICAL findings requiring remediation
GAIT Audit Trail
After completing a health check, record the session in GAIT:
python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"prompt":"F5 BIG-IP health check on $F5_IP_ADDRESS","response":"Health check completed. Virtual servers: 5/5 HEALTHY. Pools: WARNING (pool_web 3/4 members). SSL: WARNING (cert expires 21 days). Logs: HIGH (47x monitor-down events). Overall: WARNING. Action items: investigate pool_web node3, renew SSL cert, investigate log spike.","artifacts":["f5-health-report.txt"]}'