linux-administration
SKILL.md
Linux Administration Skill
Comprehensive Linux system administration and automation for Debian/Ubuntu/Mint environments
Service Management (systemd)
Essential Commands
# Status and logs
systemctl status service-name
journalctl -u service-name -n 100 --no-pager
journalctl -u service-name --since "1 hour ago"
# Control
systemctl start|stop|restart|reload service-name
systemctl enable|disable service-name
systemctl daemon-reload # After editing unit files
Debugging Failed Services
systemctl status service-name --no-pager -l
journalctl -u service-name -p err --no-pager
systemctl list-dependencies service-name
systemd-analyze verify /etc/systemd/system/service-name.service
Custom Service Template
# /etc/systemd/system/myservice.service
[Unit]
Description=My Service Description
After=network.target
[Service]
Type=simple
User=user
WorkingDirectory=/path/to/scripts
ExecStart=/path/to/venv/bin/python /path/to/scripts/service.py
Restart=on-failure
RestartSec=10s
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Automation Patterns
Core Principles
- Error Handling:
set -euo pipefail(bash), try/except (Python) - Logging: Always log to file AND stdout
- Idempotency: Scripts safe to run multiple times
- Configuration: Use config files, not hardcoded values
- Notifications: Alert on failures, not just successes
Bash Script Template
#!/bin/bash
set -euo pipefail
SCRIPT_NAME="$(basename "$0")"
LOG_FILE="/var/log/${SCRIPT_NAME%.sh}.log"
LOCK_FILE="/tmp/${SCRIPT_NAME%.sh}.lock"
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"; }
cleanup() { rm -f "$LOCK_FILE"; }
trap cleanup EXIT
if [ -f "$LOCK_FILE" ]; then log "ERROR: Already running"; exit 1; fi
touch "$LOCK_FILE"
log "Starting $SCRIPT_NAME"
# Main logic here
log "Completed $SCRIPT_NAME"
Python Script Template
#!/path/to/venv/bin/python
import sys, logging
from pathlib import Path
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[logging.FileHandler('/var/log/script.log'), logging.StreamHandler(sys.stdout)]
)
logger = logging.getLogger(__name__)
def main():
logger.info("Starting")
try:
pass # Main logic
except Exception as e:
logger.error(f"Failed: {e}")
sys.exit(1)
logger.info("Completed")
if __name__ == '__main__':
main()
Cron vs Systemd Timers
| Feature | Cron | Systemd Timer |
|---|---|---|
| Logging | Manual | Automatic (journalctl) |
| Missed runs | Lost | Persistent=true catches up |
| Dependencies | None | Can require other services |
Recommendation: Use systemd timers for production.
Systemd Timer Setup
# /etc/systemd/system/task.timer
[Unit]
Description=Task Timer
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
# /etc/systemd/system/task.service
[Unit]
Description=Task
[Service]
Type=oneshot
User=user
ExecStart=/path/to/script.sh
StandardOutput=journal
StandardError=journal
sudo systemctl daemon-reload && sudo systemctl enable --now task.timer
systemctl list-timers --all
Cron Syntax
# minute hour day month weekday command
0 2 * * * /path/to/scripts/backup.sh >> /var/log/backup.log 2>&1
*/15 * * * * /path/to/scripts/health_check.sh
0 18 * * 1-5 /path/to/scripts/report.sh
Log Analysis
journalctl Patterns
journalctl -n 100 -o short-precise
journalctl -f
journalctl --since "2025-10-02 14:00" --until "2025-10-02 15:00"
journalctl -p err
journalctl -g "error|fail|exception" -n 500
Application Log Patterns
grep " 5[0-9][0-9] " /var/log/nginx/access.log | tail -20
awk '$NF > 1.0' /var/log/nginx/access.log | tail -20
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
Network Troubleshooting
Diagnostics
ping -c 4 google.com
nslookup hostname.domain
nc -zv host.domain 443
ss -tulpn | grep LISTEN
ip route show
traceroute host.domain
Firewall (ufw)
sudo ufw status verbose
sudo ufw allow 443/tcp comment 'HTTPS'
sudo ufw allow from 192.0.2.0/24 to any port 22
sudo ufw status numbered && sudo ufw delete 5
Static IP (netplan)
# /etc/netplan/01-netcfg.yaml
network:
version: 2
ethernets:
ens33:
addresses: [192.0.2.100/24]
gateway4: 192.0.2.1
nameservers:
addresses: [8.8.8.8, 8.8.4.4]
# Apply: sudo netplan apply
Container Management
For Docker/Podman container operations, see container-testing skill.
Performance Monitoring
htop
ps aux --sort=-%cpu | head -10
ps aux --sort=-%mem | head -10
iostat -x 1 5
df -h
du -sh /var/log/* | sort -rh | head -10
find /home -type f -size +100M -exec ls -lh {} \;
Package Management (apt)
sudo apt update && sudo apt upgrade -y
sudo apt install package-name -y
sudo apt remove package-name
sudo apt --fix-broken install
sudo dpkg --configure -a
User and Permission Management
sudo adduser username
sudo usermod -aG groupname username
sudo chown -R user:group directory/
chmod 755 script.sh
namei -l /path/to/file
getfacl file && setfacl -m u:username:rwx file
Troubleshooting Workflows
Service Won't Start
systemctl status service-name
journalctl -u service-name -n 50 --no-pager
systemctl cat service-name
namei -l /etc/service-name/config.conf
High CPU
ps aux --sort=-%cpu | head -5
strace -p PID && lsof -p PID
kill PID
Disk Full
df -h
du -sh /* | sort -rh | head -10
sudo journalctl --vacuum-time=7d
find /tmp -type f -atime +7 -delete
Network Issues
ip addr show && ip route show
ping -c 4 $(ip route | grep default | awk '{print $3}')
cat /etc/resolv.conf && nslookup google.com
sudo ufw status
Automation Validation Checklist
Before deploying:
- Script runs successfully manually
- Error handling tested (force failures)
- Logs written and readable
- Idempotent (run twice, same result)
- Timer/cron syntax verified
- Permissions correct
Pattern Recognition: The same 20 commands solve 80% of problems. Good automation disappears. Bad automation wakes you at 2 AM.
Weekly Installs
1
Repository
seqis/openclaw-…ude-codeGitHub Stars
28
First Seen
11 days ago
Security Audits
Installed on
amp1
cline1
openclaw1
opencode1
cursor1
kimi-cli1