f5-troubleshoot
F5 BIG-IP Troubleshooting
Structured troubleshooting methodology for F5 BIG-IP issues. Follow a systematic approach: gather facts from multiple data sources, correlate symptoms, identify root cause, remediate, and verify.
Troubleshooting Principles
- Define the problem -- What exactly is broken? Who reported it? What is the expected vs actual behavior?
- Gather facts -- List objects, check stats, read logs. Never assume.
- Consider possibilities -- Based on facts, list likely root causes
- Create action plan -- Test one variable at a time
- Implement and verify -- Make one change, verify, document
- Document -- Record what was found and what fixed it
How to Call the Tools
The F5 MCP server provides 6 tools. Call them via mcp-call with the required environment variables:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" <tool_name> '{"param":"value"}'
Available Tools for Troubleshooting
| Tool | Purpose | When to Use |
|---|---|---|
list_tool |
List and inspect object configuration | Verify config is correct |
show_stats_tool |
Show live statistics and counters | Identify traffic flow issues |
show_logs_tool |
Show system logs | Find errors and event correlation |
update_tool |
Modify object configuration | Apply fixes |
create_tool |
Create new objects | Add missing objects |
delete_tool |
Remove objects | Remove problematic objects |
Symptom: "Virtual Server Not Responding to Clients"
Clients report they cannot connect to the application VIP.
Step 1: Verify Virtual Server Exists and Is Enabled
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Check:
- Does the virtual server exist? If not, it was deleted or never created.
- Is it
enabled: true? If disabled, someone took it out of service. - Is the
destination(VIP:port) correct? - Is a
poolassigned? - Is
sourceAddressTranslationconfigured? (Without SNAT/automap, return traffic may bypass the BIG-IP.)
Decision tree:
- VS does not exist -> Recreate it (use f5-config-mgmt skill)
- VS is disabled -> Re-enable:
update_toolwith{"enabled":true} - VS has no pool -> Assign pool:
update_toolwith{"pool":"pool_name"} - VS has no SNAT -> Check if servers have BIG-IP as default gateway; if not, add automap
Step 2: Check Virtual Server Statistics
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Analyze:
| Metric | Healthy Indicator | Problem Indicator |
|---|---|---|
| Status availability | available |
offline or unknown |
| Current connections | > 0 during business hours | 0 on production VIP |
| Total connections | Incrementing | Flat or zero |
| Client-side bits in | > 0 | Zero (no client traffic arriving) |
| Server-side bits out | > 0 | Zero (no traffic reaching backend) |
| Client bits in, server bits out = 0 | - | VIP not processing traffic at all |
| Client bits in > 0, server bits out = 0 | - | Traffic arriving but not forwarded to pool |
If status is offline:
The virtual server is marked down because the associated pool has no available members. Proceed to Step 3.
If current connections = 0 but status is available:
The VIP is healthy but no clients are connecting. The issue is upstream of the BIG-IP:
- DNS not resolving to the VIP address
- Firewall blocking traffic to the VIP
- Client network routing issue
- VIP is on wrong VLAN/subnet
Step 3: Check the Associated Pool
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Check:
- Are any members
available? If all members areoffline, the pool is down. - What monitor is assigned? Is it appropriate for the service?
- Are members
enabledordisabled? Disabled members were intentionally drained. - What is the member-to-connection distribution? Is one member handling all traffic?
If all members are offline -> Go to "Pool Member Marked Down" section below.
Step 4: Check Logs for Errors
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"200"}'
Scan for:
01010028-- No members available in pool (confirms pool down)01010025-- Connection limit reached on virtual server0107142f-- SSL handshake failure01070417-- HTTP parse error01010240-- Connection queue full- Timestamps correlating with the reported outage
Step 5: Check Profiles and iRules
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"profile"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
Check:
- Is the correct SSL profile assigned for HTTPS virtual servers?
- Is the HTTP profile assigned when HTTP inspection is needed?
- Are any iRules rejecting or redirecting traffic incorrectly?
- Is a persistence profile causing traffic to stick to a down member?
Symptom: "Pool Member Marked Down"
Health monitor is marking one or more pool members as offline.
Step 1: Identify Which Members Are Down
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Record: Which members are offline, which are available, which are disabled.
Step 2: Check Pool Statistics for the Down Member
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Analyze:
- When did the member go down? (Check stats timestamps)
- Was there a gradual decline or sudden failure?
- Are connections draining from the down member?
Step 3: Check Logs for Monitor Failure Details
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
Scan for these patterns:
| Log Message | Meaning | Common Cause |
|---|---|---|
01071681 Pool member ... monitor status down |
Health check failed | Server not responding |
01071682 Pool member ... monitor status up |
Health check recovered | Server came back |
01010028 No members available |
All members down | Total pool failure |
FQDN ... cannot be resolved |
DNS resolution failure | DNS issue for FQDN pool members |
monitor ... instance ... timed out |
Monitor timeout | Server too slow or unreachable |
Common root causes for pool member down:
- Server is actually down -- The application crashed, the OS is down, or the server was rebooted
- Network path issue -- Firewall between BIG-IP and server blocking health check traffic, or routing issue on server VLAN
- Monitor mismatch -- HTTP monitor expecting 200 but application returns 301/302 redirect
- Monitor URI wrong -- Health check URI returns 404 because the page does not exist
- Port mismatch -- Monitor checking wrong port (e.g., monitor on 80 but server on 8080)
- SSL mismatch -- HTTP monitor used but server requires HTTPS (or vice versa)
- Response timeout -- Server responds but too slowly for the monitor interval/timeout
- Receive string mismatch -- Monitor expects specific string in response that changed after app deployment
- Source IP issue -- Server firewall blocking the BIG-IP self-IP used for health checks
Step 4: Verify Monitor Configuration
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
From the pool config, identify the monitor name and verify:
- Type: HTTP, HTTPS, TCP, ICMP, or custom
- Interval/timeout: Is the timeout shorter than the interval? (Must be: timeout < interval * 3+1 for 3 failures)
- Send string: What request is sent? (e.g.,
GET /health HTTP/1.1\r\nHost: app.example.com\r\n\r\n) - Receive string: What response is expected? (e.g.,
200 OKorhealthy) - Destination: Is it
*:*(use member address:port) or a specific IP:port?
Step 5: Remediation
If the server is healthy but the monitor is wrong, fix the monitor:
Update the pool with a correct monitor:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"monitor":"tcp"},"object_type":"pool","object_name":"pool_webapp"}'
If a member needs to be temporarily removed (graceful drain):
Update the pool without the problematic member:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"members":["10.1.1.10:80","10.1.1.11:80"]},"object_type":"pool","object_name":"pool_webapp"}'
WARNING: This removes the member entirely. Existing connections will be terminated. For graceful drain, disable the member instead if the API supports it.
If a replacement member needs to be added:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"members":["10.1.1.10:80","10.1.1.11:80","10.1.1.14:80"]},"object_type":"pool","object_name":"pool_webapp"}'
Symptom: "Connection Limits / Persistence Issues"
Users report intermittent connectivity, session drops, or being load-balanced to a different server mid-session.
Step 1: Check Virtual Server Connection Statistics
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Check for connection limit issues:
- Is
connectionLimitset and being reached? - Are
clientsideCurConnsnear the limit? - Is the connection queue filling up? (Check logs for
01010240)
If connection limit is being hit:
Either increase the limit or scale out with additional pool members:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"connectionLimit":0},"object_type":"virtual","object_name":"vs_webapp_https"}'
Setting connectionLimit to 0 removes the limit entirely.
Step 2: Check Persistence Configuration
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Persistence troubleshooting:
| Issue | Symptom | Resolution |
|---|---|---|
| No persistence configured | Users lose session on every request | Add cookie or source-addr persistence |
| Source-addr persistence with SNAT | All users from same SNAT IP go to same member | Switch to cookie persistence |
| Cookie persistence but app on HTTP | Persistence cookie not inserted | Ensure HTTP profile is assigned |
| Persistence timeout too short | Users lose session during idle | Increase persistence timeout |
| Persistence timeout too long | Sessions stick to drained member | Lower timeout or use cookie |
| Fallback persistence not set | When primary persistence fails, connections randomize | Set fallback persistence |
Step 3: Check Pool Member Connection Distribution
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
If one member has vastly more connections than others:
- Persistence is sticking too many sessions to one member
- Consider changing from source-address to cookie persistence
- Consider changing load balancing method from round-robin to least-connections
Step 4: Check Logs for Connection Errors
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"300"}'
Scan for:
01010025-- Connection limit reached01010240-- Connection queue full01060102-- Rate limit reachedTCL error-- iRule causing connection dropsreset cause-- Connection resets (RST) from server or BIG-IP
Symptom: "SSL/TLS Certificate Problems"
Users see certificate warnings, SSL handshake failures, or HTTPS connections fail entirely.
Step 1: Check SSL Profile Configuration
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"profile"}'
Check the SSL client profile assigned to the virtual server:
- Is a client SSL profile assigned? (Required for HTTPS VIPs)
- Which certificate and key are referenced?
- What TLS versions are enabled? (TLS 1.2 and 1.3 should be enabled; TLS 1.0 and 1.1 should be disabled)
- What cipher suites are configured?
Common SSL issues:
| Issue | Symptom | Log Pattern |
|---|---|---|
| Expired certificate | Browser shows "Not Secure" | 0107142f SSL handshake failed |
| Wrong certificate (hostname mismatch) | Browser shows certificate warning | Client disconnects after handshake |
| Missing intermediate CA | Works in some browsers, fails in others | 0107143c certificate verification failed |
| Weak cipher suite only | Modern browsers refuse to connect | 0107142f with no common cipher |
| TLS version mismatch | Client can't negotiate | 0107142f protocol version |
| Client cert required but not sent | Connection refused | 01071065 peer did not return certificate |
| SNI misconfiguration | Wrong cert served for hostname | Client sees cert for different domain |
Step 2: Check Virtual Server for SSL Profile
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Verify the correct SSL profile is assigned in the profiles list with context: clientside.
Step 3: Check Logs for SSL Errors
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"300"}'
Key SSL log messages:
| Log Code | Meaning | Action |
|---|---|---|
0107142f |
SSL handshake failure | Check cipher/version/cert compatibility |
0107143c |
Certificate verification failure | Check cert chain completeness |
01071065 |
Peer certificate missing | Client cert auth configured but client has no cert |
01070417 |
HTTP request on HTTPS port | Client sending plain HTTP to SSL VIP |
SSL routines:ssl3_read_bytes:sslv3 alert |
SSL alert received from peer | Version/cipher mismatch |
Step 4: Remediation
Update SSL profile ciphers to modern standards:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"ciphers":"TLSv1.2:TLSv1.3:!SSLv3:!RC4:!3DES:!EXPORT"},"object_type":"profile","object_name":"clientssl_webapp"}'
Assign the correct SSL profile to a virtual server:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"profiles":[{"name":"clientssl_webapp","context":"clientside"},{"name":"http"},{"name":"tcp-wan-optimized","context":"clientside"},{"name":"tcp-lan-optimized","context":"serverside"}]},"object_type":"virtual","object_name":"vs_webapp_https"}'
WARNING: The profiles list is a full replacement. Include ALL desired profiles.
Symptom: "iRule Errors in Logs"
Logs show TCL errors or iRule-related failures.
Step 1: Pull Recent Logs
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
Scan for iRule error patterns:
| Pattern | Meaning | Common Cause |
|---|---|---|
TCL error |
Tcl script runtime error | Syntax error, undefined variable, missing command |
can't read "variable" |
Variable not defined | Variable used before assignment or in wrong event |
command not found |
Invalid Tcl or iRule command | Typo or deprecated command |
HTTP::collect without HTTP::release |
Payload collection started but never released | Missing release in all code paths (memory leak) |
invalid command name "pool" |
Pool command in wrong event | pool used outside HTTP_REQUEST event |
too many re-entering calls |
Recursive iRule invocation | iRule triggering itself |
exceeded CPU time limit |
iRule taking too long | Complex regex or infinite loop |
abort |
iRule explicitly aborted | Error condition in catch block |
Step 2: Identify the Problematic iRule
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
Cross-reference the iRule name from the log error with the iRule inventory. Check which virtual servers have this iRule assigned.
Step 3: Review iRule Content
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"problematic_irule","object_type":"irule"}'
Common iRule bugs to check for:
- Variables used across events without being set in all code paths
HTTP::collectwithout correspondingHTTP::releasein all branches- Missing
defaultcase inswitchstatements - Regex patterns that can cause catastrophic backtracking
logstatements in high-traffic events (performance issue, not error)- String operations on binary data
- Missing error handling (
catch) around operations that can fail
Step 4: Fix the iRule
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"apiAnonymous":"when HTTP_REQUEST {\n catch {\n switch -glob [string tolower [HTTP::uri]] {\n \"/api/*\" { pool pool_api_backend }\n default { pool pool_webapp }\n }\n } err {\n log local0. \"iRule error: $err\"\n pool pool_webapp\n }\n}"},"object_type":"irule","object_name":"uri_routing"}'
Alternatively, if the iRule is causing critical failures, remove it from the virtual server immediately:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"rules":[]},"object_type":"virtual","object_name":"vs_webapp_https"}'
This removes all iRules from the virtual server. Traffic will flow to the default pool without any iRule processing. Fix the iRule, then re-attach it.
Symptom: "Performance Degradation"
Application is slow, high latency, or throughput has dropped.
Step 1: Check Virtual Server Statistics
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"vs_webapp_https","object_type":"virtual"}'
Look for:
- Connection count near the limit -> Bottleneck at the VIP
- High bits/sec relative to interface capacity -> Bandwidth saturation
- Connection rate spike -> Possible DDoS or legitimate traffic surge
- Asymmetric traffic (high client-side, low server-side) -> Backend not keeping up
Step 2: Check Pool Member Distribution
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_stats_tool '{"object_name":"pool_webapp","object_type":"pool"}'
Look for:
- Uneven connection distribution -> Some members overloaded, others idle
- Single member with most connections -> Persistence issue or members down
- All members at high connection count -> Need more backend capacity
- High server-side connection time -> Backend application slow
If distribution is uneven, consider changing load balancing:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"loadBalancingMode":"least-connections-member"},"object_type":"pool","object_name":"pool_webapp"}'
Step 3: Check for Pool Members Down (Reduced Capacity)
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"pool_webapp","object_type":"pool"}'
If members are down, the remaining members are handling more traffic than designed. This is the most common cause of "slow application" reports -- not a BIG-IP issue but a capacity issue.
Step 4: Check System Logs for Errors
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
Performance-related log patterns:
| Pattern | Meaning | Action |
|---|---|---|
01010025 |
Connection limit reached | Increase limit or add capacity |
01010240 |
Connection queue full | Increase queue depth or backend capacity |
01060102 |
Rate limit reached | Review rate limiting config |
01070727 |
Pool member rate limit | Member receiving too much traffic |
memory |
BIG-IP memory pressure | Check for memory leaks, iRule issues |
disk_usage |
BIG-IP disk pressure | Check for log rotation issues |
tmm_semaphore |
TMM (Traffic Management Microkernel) contention | BIG-IP itself is overloaded |
aggressive_mode |
Memory aggressive mode enabled | BIG-IP is under severe memory pressure |
Step 5: Check iRules for Performance Impact
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"irule"}'
iRule performance killers:
logstatements on every request -> Disk I/O bottleneck- Complex regex matching -> CPU overhead
HTTP::collectlarge payloads -> Memory consumptionDNS::lookupin data path -> Blocking operation, adds latency- Multiple iRules with same events -> Event processing overhead
persist uiewith large strings -> Persistence table bloat
Step 6: Scale Out (If Root Cause Is Capacity)
If the root cause is insufficient backend capacity, add more pool members:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" update_tool '{"url_body":{"members":["10.1.1.10:80","10.1.1.11:80","10.1.1.12:80","10.1.1.13:80","10.1.1.14:80"]},"object_type":"pool","object_name":"pool_webapp"}'
WARNING: Members list is a full replacement. Include ALL desired members (existing + new).
Symptom: "HA Failover or Sync Issues"
Logs indicate high-availability state changes, failover events, or configuration sync failures.
Step 1: Check System Logs for HA Events
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" show_logs_tool '{"lines_number":"500"}'
HA-related log patterns:
| Pattern | Severity | Meaning |
|---|---|---|
ha_status active -> standby |
CRITICAL | This unit has gone standby -- failover occurred |
ha_status standby -> active |
CRITICAL | This unit has become active -- peer failed |
failover |
CRITICAL | Failover event in progress |
config_sync failed |
HIGH | Configuration not synchronizing between peers |
device_trust |
HIGH | Device trust certificate issue |
heartbeat lost |
CRITICAL | HA heartbeat lost -- peer may be down |
network_failover |
CRITICAL | Network-based failover triggered |
Step 2: Verify Object State After Failover
After any failover event, immediately verify all virtual servers and pools:
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"virtual"}'
IP_ADDRESS=$F5_IP_ADDRESS Authorization_string=$F5_AUTH_STRING python3 $MCP_CALL "python3 -u $F5_MCP_SCRIPT" list_tool '{"object_name":"","object_type":"pool"}'
Confirm all virtual servers are available and all pool members are healthy on the now-active unit.
Common F5 Error Code Quick Reference
| Code | Severity | Meaning | First Action |
|---|---|---|---|
01010025 |
HIGH | VS connection limit reached | Check stats, increase limit |
01010028 |
CRITICAL | No pool members available | Check pool health |
01010029 |
CRITICAL | Pool member monitor down | Check member + monitor |
01010240 |
HIGH | Connection queue full | Check capacity |
01060102 |
HIGH | Rate limit reached | Review rate config |
0107142f |
CRITICAL | SSL handshake failure | Check cert + ciphers |
01070417 |
HIGH | HTTP parse error | Check client requests |
0107143c |
WARNING | Cert verification fail | Check cert chain |
01071681 |
WARNING | Pool member marked down | Check member health |
01071682 |
INFO | Pool member marked up | Recovery event |
01070727 |
WARNING | Member rate limit | Check distribution |
TCL error |
HIGH | iRule error | Check iRule code |
Troubleshooting Decision Flowchart
Client reports application down
|
+-> Check VIP status (list_tool + show_stats_tool virtual)
|
+-> VIP offline?
| +-> Check pool (list_tool + show_stats_tool pool)
| +-> All members down? -> Check servers + monitors
| +-> Some members down? -> Reduced capacity, check remaining
| +-> No pool assigned? -> Assign pool (update_tool)
|
+-> VIP available but 0 connections?
| +-> DNS, firewall, or routing issue upstream of BIG-IP
|
+-> VIP available, connections present, but errors?
+-> Check logs (show_logs_tool)
+-> SSL errors? -> Check profiles + certs
+-> HTTP errors? -> Check iRules + backend health
+-> Connection limits? -> Scale out or increase limits
Integration with Other Skills
| Skill | Integration Point |
|---|---|
| f5-health-check | Run health check first to scope the problem |
| f5-config-mgmt | Apply fixes using proper change workflow |
| servicenow-change-workflow | Create incident tickets for CRITICAL findings |
| drawio-diagram | Visualize traffic flow for complex troubleshooting |
| markmap-viz | Create troubleshooting decision trees |
GAIT Audit Trail
After completing a troubleshooting session, record findings and resolution in GAIT:
python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"prompt":"F5 troubleshoot: vs_webapp_https not responding to clients","response":"Investigation: VIP status offline due to pool_webapp all members down. Root cause: HTTP health monitor expecting 200 but app returning 301 redirect after deployment. Fix: updated monitor receive string to accept 301. Verification: all 3 pool members now available, VIP status available, client connections incrementing. Logs clear of 01010028 errors.","artifacts":["f5-troubleshoot-report.txt"]}'