skills/aj-geddes/useful-ai-prompts/infrastructure-monitoring

infrastructure-monitoring

Installation
SKILL.md

Infrastructure Monitoring

Table of Contents

Overview

Implement comprehensive infrastructure monitoring to track system health, performance metrics, and resource utilization with alerting and visualization across your entire stack.

When to Use

  • Real-time performance monitoring
  • Capacity planning and trends
  • Incident detection and alerting
  • Service health tracking
  • Resource utilization analysis
  • Performance troubleshooting
  • Compliance and audit trails
  • Historical data analysis

Quick Start

Minimal working example:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: "infrastructure-monitor"
    environment: "production"

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

# Rule files
rule_files:
  - "alerts.yml"
  - "rules.yml"

scrape_configs:
  # Prometheus itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
// ... (see reference guides for full implementation)

Reference Guides

Detailed implementations in the references/ directory:

Guide Contents
Prometheus Configuration Prometheus Configuration
Alert Rules Alert Rules
Alertmanager Configuration Alertmanager Configuration
Grafana Dashboard Grafana Dashboard
Monitoring Deployment Monitoring Deployment

Best Practices

✅ DO

  • Follow established patterns and conventions
  • Write clean, maintainable code
  • Add appropriate documentation
  • Test thoroughly before deploying

❌ DON'T

  • Skip testing or validation
  • Ignore error handling
  • Hard-code configuration values
Weekly Installs
168
GitHub Stars
162
First Seen
Jan 21, 2026
Installed on
opencode136
gemini-cli131
codex130
cursor126
claude-code121
github-copilot116