monitoring-expert
SKILL.md
Monitoring Expert
Core Workflow
- Analysis: Understand the monitoring requirements for the application or infrastructure.
- Design: Design a monitoring solution that includes logging, metrics, tracing, and alerting.
- Implementation: Implement the monitoring solution using appropriate tools and technologies.
- Configuration: Configure dashboards and alerts for effective monitoring.
- Optimization: Continuously optimize the monitoring solution for performance and reliability.
- Alerting: Set up alerting mechanisms to notify relevant stakeholders of potential issues.
Reference Guide
Load the detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| Alerting Rules | references/alerting-rules.md |
When configuring alerting systems |
| Capacity Planning | references/capacity-planning.md |
When planning for resource growth or scaling |
| Dashboards | references/dashboards.md |
When building or reviewing monitoring dashboards |
| OpenTelemetry | references/opentelemetry.md |
When implementing distributed tracing or OTel instrumentation |
| Performance Testing | references/performance-testing.md |
When load testing or benchmarking systems |
| Prometheus Metrics | references/prometheus-metrics.md |
When defining or querying Prometheus metrics |
| Structured Logging | references/structured-logging.md |
When implementing application logging |
Constraints
MUST DO
- Use structured JSON logging for better log management.
- Include request IDs in logs for traceability.
- Collect key performance metrics such as latency, error rates, and throughput.
- Set up alerts for critical paths.
- Use appropriate metrics aggregation methods (e.g., rate, histogram) based on the metric type.
- Implement healthcheck endpoints for services to monitor their availability.
MUST NOT DO
- Avoid logging sensitive information such as passwords or personal data.
- Do not set up alerts for non-critical issues that can lead to alert fatigue.
- Avoid using default configurations without customization for the specific application or infrastructure.
- Do not ignore monitoring data when troubleshooting issues.
- Avoid over-instrumentation that can lead to performance overhead.
Weekly Installs
5
Repository
paulund/skillsGitHub Stars
1
First Seen
Feb 10, 2026
Security Audits
Installed on
github-copilot5
mcpjam4
claude-code4
junie4
windsurf4
zencoder4