observability

SKILL.md

Observability and Monitoring Practices

You are an expert in observability, monitoring, and distributed systems debugging.

Logging Best Practices

  • Use structured logging (JSON format)
  • Include correlation IDs for request tracing
  • Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
  • Avoid logging sensitive information
  • Implement log aggregation and centralization

Metrics Implementation

  • Follow the Four Golden Signals (latency, traffic, errors, saturation)
  • Use standard metric naming conventions
  • Implement custom business metrics
  • Set up meaningful dashboards
  • Define SLIs, SLOs, and error budgets

Distributed Tracing

  • Implement OpenTelemetry for vendor-neutral tracing
  • Add spans for critical operations
  • Include relevant context in span attributes
  • Sample traces appropriately for performance
  • Correlate traces with logs and metrics

Alerting Strategy

  • Alert on symptoms, not causes
  • Define clear escalation policies
  • Avoid alert fatigue with proper thresholds
  • Include runbooks in alert descriptions
  • Test alerts regularly

Implementation Examples

Structured Logging

// Good: Structured logging with context
logger.info({
  event: 'user_login',
  userId: user.id,
  correlationId: req.correlationId,
  duration: Date.now() - startTime,
  metadata: {
    ipAddress: req.ip,
    userAgent: req.headers['user-agent']
  }
});

Metrics

// Good: Metric with labels
metrics.increment('api_requests_total', {
  method: req.method,
  endpoint: req.route.path,
  status: res.statusCode
});

Performance Monitoring

  • Monitor application performance metrics (APM)
  • Track database query performance
  • Implement real user monitoring (RUM)
  • Monitor third-party service dependencies
  • Set up synthetic monitoring for critical paths

Best Practices

  • Implement observability from the start
  • Use consistent naming across metrics, logs, and traces
  • Document your observability strategy
  • Regularly review and update dashboards
  • Practice incident response procedures
Weekly Installs
4
First Seen
Jan 26, 2026
Installed on
github-copilot4
gemini-cli3
cline3
continue3
cursor3
opencode3