debug-rhel
/debug-rhel Skill
Diagnose RHEL system issues by automatically gathering systemd status, journal logs, SELinux denials, and firewall configuration.
Overview
[Connect] → [Identify Service] → [systemd Status] → [Journal Logs] → [SELinux] → [Firewall] → [Summary]
This skill diagnoses:
- systemd service failures
- SELinux access denials (AVC)
- Firewall port blocking
- Permission issues
- Resource constraints
Prerequisites
- SSH access to target RHEL host
- sudo privileges on the target host
- RHEL 8+, CentOS Stream, Rocky Linux, or Fedora
Critical: Human-in-the-Loop Requirements
See Human-in-the-Loop Requirements for mandatory checkpoint behavior.
Note: SSH/Bash Required
This skill operates on remote RHEL hosts via SSH, not local MCP servers. Unlike OpenShift/Podman skills, direct Bash commands with SSH are the correct approach here since MCP servers run locally and cannot access remote systems.
When to Use This Skill
Use /debug-rhel when applications fail on standalone RHEL, Fedora, or CentOS hosts. This skill automates multi-step diagnosis of systemd service failures, SELinux denials, firewall blocking, and system resource problems via SSH.
Workflow
Phase 1: SSH Connection
## RHEL System Debugging
I'll help you diagnose issues on your RHEL system.
**SSH Target:**
[If RHEL_HOST in session state from /rhel-deploy:]
- Using previous connection: [user]@[host]
Is this correct? (yes/no/different host)
[If no RHEL_HOST:]
Please provide your RHEL host details:
| Setting | Value | Default |
|---------|-------|---------|
| Host | [required] | - |
| User | [current user] | $USER |
| Port | 22 | 22 |
**Enter your SSH target:**
WAIT for user to confirm or provide host.
Connection verification:
# Test SSH connection
ssh -o BatchMode=yes -o ConnectTimeout=10 [user]@[host] "echo 'Connection successful'"
If connection fails:
**SSH Connection Failed**
Unable to connect to [host].
**Troubleshooting:**
1. Check host is reachable: `ping [host]`
2. Verify SSH key is configured: `ssh-add -l`
3. Check firewall allows SSH: port 22
4. Verify username is correct
Would you like to:
1. Try a different host
2. Get help with SSH setup
3. Exit
Phase 2: Identify Target Service
## Phase 2: Identify Service
Which service would you like me to debug?
1. **Specify service name** - Enter the systemd unit name
2. **List failed services** - Show failed services on the host
3. **From /rhel-deploy** - Debug the last deployed service
Select an option or enter a service name:
WAIT for user response.
If user selects "List failed services":
# Get failed services
ssh [user]@[host] "systemctl --failed --no-pager"
## Failed Services on [host]
| Unit | Load | Active | Sub | Description |
|------|------|--------|-----|-------------|
| [myapp.service] | loaded | failed | failed | My Application |
| [other.service] | loaded | failed | failed | Other Service |
Which service would you like me to debug?
WAIT for user to select a service.
Phase 3: Get Service Status
# Get detailed service status
ssh [user]@[host] "systemctl status [service] --no-pager -l"
## Service Status: [service-name]
**Status Overview:**
| Field | Value |
|-------|-------|
| Loaded | [loaded/not-found/masked] |
| Active | [active (running)/inactive (dead)/failed] |
| Main PID | [pid or N/A] |
| Status | [status text] |
| Since | [timestamp] |
**Recent Activity:**
[systemctl status output - last 10 lines]
**Quick Assessment:**
[Based on status, provide initial assessment - e.g., "Service failed to start - exit code 1 suggests application error"]
Continue with journal logs? (yes/no)
WAIT for user confirmation before proceeding.
Phase 4: Analyze Journal Logs
# Get service logs
ssh [user]@[host] "journalctl -u [service] -n 100 --no-pager"
## Journal Logs: [service-name]
**Last 100 log entries:**
[journalctl output]
**Log Analysis:**
[Analyze logs and identify errors:]
**Errors Found:**
- [timestamp]: [error - e.g., "Permission denied: /var/data/config.yaml"]
- [timestamp]: [error - e.g., "Connection refused: localhost:5432"]
- [timestamp]: [error - e.g., "Port 8080 already in use"]
**Error Categories:**
| Category | Count | Example |
|----------|-------|---------|
| Permission | [X] | [first occurrence] |
| Connection | [Y] | [first occurrence] |
| Resource | [Z] | [first occurrence] |
Continue to check SELinux? (yes/no/skip)
WAIT for user confirmation before proceeding.
Phase 5: Check SELinux Denials
# Check SELinux status
ssh [user]@[host] "getenforce"
# Get recent AVC denials
ssh [user]@[host] "sudo ausearch -m AVC -ts recent 2>/dev/null || echo 'No recent denials or ausearch not available'"
## SELinux Analysis
**SELinux Status:** [Enforcing/Permissive/Disabled]
**Recent AVC Denials:**
[If denials found:]
| Time | Source | Target | Permission | Denied |
|------|--------|--------|------------|--------|
| [time] | [source_context] | [target_context] | [permission] | [target_file] |
| [time] | [source_context] | [target_context] | [permission] | [target_port] |
**Denial Analysis:**
**Denial 1: [description]**
- **What happened:** Process `[name]` tried to [action] on `[target]`
- **Why denied:** SELinux type `[source_type]` cannot [action] `[target_type]`
- **Impact:** [how this affects the application]
**Recommended Fixes:**
1. **Set SELinux boolean** (if applicable):
```bash
sudo setsebool -P [boolean_name] on
-
Change file context (if file access):
sudo semanage fcontext -a -t [correct_type] "[path](/.*)?" sudo restorecon -Rv [path] -
Allow port (if port binding):
sudo semanage port -a -t [port_type] -p tcp [port]
[If no denials:] No recent SELinux denials found. SELinux is likely not the issue.
Continue to check firewall? (yes/no/skip)
**WAIT for user confirmation before proceeding.**
### Phase 6: Check Firewall
```bash
# Get firewall status
ssh [user]@[host] "sudo firewall-cmd --state 2>/dev/null || echo 'firewalld not running'"
# List firewall rules
ssh [user]@[host] "sudo firewall-cmd --list-all 2>/dev/null"
## Firewall Analysis
**Firewall Status:** [running/not running]
**Active Zone:** [zone-name]
**Current Rules:**
| Type | Value |
|------|-------|
| Services | [ssh, http, https, ...] |
| Ports | [8080/tcp, 3000/tcp, ...] |
| Rich Rules | [count] |
**Application Port:** [detected-port from logs/config]
**Port Status:**
| Port | Protocol | Status |
|------|----------|--------|
| [8080] | TCP | [OPEN/BLOCKED] |
| [443] | TCP | [OPEN/BLOCKED] |
[If port blocked:]
**WARNING: Application port [port] is NOT open in firewall!**
**To open port:**
```bash
sudo firewall-cmd --permanent --add-port=[port]/tcp
sudo firewall-cmd --reload
Or add service:
sudo firewall-cmd --permanent --add-service=[service]
sudo firewall-cmd --reload
Continue to diagnosis summary? (yes/no)
**WAIT for user confirmation before proceeding.**
### Phase 7: Red Hat Insights Check (Optional)
**This phase runs only if the `lightspeed-mcp` server is available.** Use `ToolSearch` to check for Lightspeed MCP tools. If not available, skip this phase silently and proceed to Phase 8.
**Step 1:** Use `find_host_by_name` with the hostname from `RHEL_HOST` to look up the system in Red Hat Insights.
**Step 2:** If system found, use `get_system_cves` with the system ID to check for known CVEs affecting this system.
**Step 3:** Use `get_active_rules` to get advisor configuration recommendations. Optionally use `get_rule_by_text_search` with error text found in Phase 4 logs to find relevant advisor recommendations.
```markdown
## Red Hat Insights Check
**System in Insights:** [Found / Not registered]
[If found:]
**System Details:**
| Field | Value |
|-------|-------|
| Display Name | [hostname] |
| RHEL Version | [version] |
| Last Check-in | [timestamp] |
| Stale | [yes/no] |
**Known Vulnerabilities:**
| CVE | CVSS | Severity | Remediation |
|-----|------|----------|-------------|
| [CVE-ID] | [score] | [severity] | [Available/None] |
**Advisor Recommendations:**
| Rule | Category | Risk | Description |
|------|----------|------|-------------|
| [rule-id] | [Security/Performance/Availability/Stability] | [Critical/Important/Moderate/Low] | [description] |
[If any CVE or advisor rule matches the symptoms from earlier phases:]
**Potentially Related to Current Issue:**
- [CVE or advisor rule that matches the symptoms]
Continue to diagnosis summary? (yes/no)
WAIT for user confirmation before proceeding.
[If system not registered in Insights, just note it:]
## Red Hat Insights Check
System [hostname] is not registered in Red Hat Insights. Skipping vulnerability and advisor checks.
Continue to diagnosis summary? (yes/no)
Phase 8: Present Diagnosis Summary
## Diagnosis Summary: [service-name] on [host]
### Root Cause
**Primary Issue:** [Categorized root cause]
| Category | Status | Details |
|----------|--------|---------|
| Service Unit | [OK/FAIL] | [loaded/enabled status] |
| Application | [OK/FAIL] | [exit code, error] |
| SELinux | [OK/BLOCKED] | [denial count] |
| Firewall | [OK/BLOCKED] | [port status] |
| Permissions | [OK/FAIL] | [file/dir issues] |
| Resources | [OK/FAIL] | [memory/cpu/disk] |
| Insights/CVE | [OK/WARN/N/A] | [CVE count or "Not registered"] |
### Detailed Findings
**[Category 1: e.g., SELinux Denial]**
- Problem: [specific problem - e.g., "httpd_t cannot bind to port 8080"]
- Evidence: [AVC denial message]
- Impact: [application cannot start]
**[Category 2: e.g., Missing Dependency]**
- Problem: [specific problem - e.g., "libpq.so.5 not found"]
- Evidence: [error from logs]
- Impact: [application crashes on startup]
### Recommended Actions
1. **[Action 1 - Highest Priority]** - [description]
```bash
ssh [user]@[host] "[command]"
-
[Action 2] - [description]
ssh [user]@[host] "[command]" -
[Action 3] - [description]
ssh [user]@[host] "[command]"
Verify Fix
After applying fixes:
# Restart service
ssh [user]@[host] "sudo systemctl restart [service]"
# Check status
ssh [user]@[host] "systemctl status [service]"
# View logs
ssh [user]@[host] "journalctl -u [service] -f"
Would you like me to:
- Execute one of the recommended fixes
- Dig deeper into a specific area
- Restart the service
- View live logs
- Exit debugging
Select an option:
**WAIT for user to select next action.**
For common RHEL issues (systemd exit codes, SELinux denials, firewall), see [debugging-patterns.md](../../docs/debugging-patterns.md) and [selinux-troubleshooting.md](../../docs/selinux-troubleshooting.md).
## Dependencies
### Required MCP Servers
- `lightspeed-mcp` (optional) - Red Hat Insights CVE and advisor checks in Phase 7
### Related Skills
- `/rhel-deploy` - redeploy after fixing issues
- `/debug-container` - debug Podman containers on the host
### Reference Documentation
- [docs/selinux-troubleshooting.md](../../docs/selinux-troubleshooting.md) - SELinux denial analysis
- [docs/rhel-deployment.md](../../docs/rhel-deployment.md) - RHEL deployment patterns
- [docs/debugging-patterns.md](../../docs/debugging-patterns.md) - Common error patterns
- [docs/prerequisites.md](../../docs/prerequisites.md) - Required tools and setup